• No results found

Iptv Reference Book

N/A
N/A
Protected

Academic year: 2021

Share "Iptv Reference Book"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Table of Contents

Preamble ... 3

Compression ... 4

Typical bitrates ... 4

Lossy and lossless compression ... 4

Group of pictures... 4

Other techniques ... 5

Standard video codecs ... 5

Audio compression ... 5

OSI model applied to IPTV... 6

Encapsulation... 6

Connected vs not connected protocols ... 6

Gateway ... 7 IPTV Gateway ... 7 Satellite Television... 9 Positioning ... 9 Frequency range ... 11 Reception ... 11 Selection ... 13 Quadrant selection ... 13

Sec Tone burst... 13

DiSEqC ... 15

Transponders and multiplexes ... 15

Definitions ... 15 Lineup ... 15 Signal strength ... 16 Demodulation ... 17 QPSK... 17 8PSK ... 17 Symbol rate ... 17 Error Correction ... 17

Bit error rate ... 18

Measuring signal ... 18

Terrestrial Television... 20

(3)

MPEG2-TS in OSI model ... 23

MPEG2-TS: time multiplexing ... 23

MPEG2-TS as a file container ... 24

MPEG2-TS: Implementation details ... 24

PSI tables ... 24

SI tables... 25

Multi Program or Single Program ... 26

Descrambling... 27

Conditional Access Module ... 27

Conditional Access System ... 27

Communicate with the CAM ... 28

Keys update ... 28

How to choose a CAM ... 28

IP networks... 29 RTP ... 29 UDP ... 29 IP ... 30 Multicast... 30 Ethernet... 31 RTSP... 33 TCP ... 33 RTSP details ... 34 Requests ... 34 Headers... 34 Answers ... 34 Example ... 35 Over-The-Top TV ... 37 Principles ... 37

Pros and Cons... 38

HTTP... 38

Requests ... 38

Answers ... 39

Headers... 39

(4)

ISO Base Media File format ... 40 MP4 ... 40 PIFF/ISMV ... 42 HLS... 43 Segments ... 43 Variant playlist ... 43 Segment playlists... 43 Multiple audio ... 44 HLS scrambling ... 44

Digital Rights Management ... 44

Example with Verimatrix key server ... 44

Smooth Streaming... 46 Fragments ... 46 Manifest... 46 Scrambling ... 46 PlayReady ... 46 HDS... 49 Fragments ... 49 Manifest... 49 MPEG Dash ... 50 Segments ... 50

Media Presentation Description ... 50

Input formats... 52

Smooth Streaming as a pivot format ... 52

Smooth Streaming server file format ... 52

Server manifest ... 52 Protocol ... 53 Multibitrate TS ... 53 Subtitles... 54 EIA-608 ... 54 Teletext ... 54 DVB-SUB ... 54 TTML and DFXP ... 54 SRT ... 55 Ecosystem... 56 Head end ... 56 DVB-to-IPTV Gateway ... 56

(5)
(6)

Preamble

Anevia provides this document to its customers, partners and integrators to help them succeed with IPTV installs by explaining the theory behind IPTV.

This document is a good complement to the training courses offered by the Anevia Operations Team as well as the

product documentation, Anevia Troubleshooting guide and the Installation prerequisites.

After an introduction to audio and video compression, it will focus on the transporting of digital television and how a DVB-to-IPTV gateway will convert it to IP. We will then see how a Video-On-Demand server works with the RTSP protocol to control a streaming session. We will introduce a new way of transporting IPTV: Over-The-Top TV. Finally, we will describe the different systems and partners involved in an IPTV setup.

If you have any questions do not hesitate to contact us through our main office or Web through: http://support. anevia.com.

If you have any questions do not hesitate to contact us through our main office or web site: http://support. anevia.com.

(7)

720 × 576 × 25 × 24 = 250M bit/s(M bps).

As the usual television satellite is transporting data at around 50 Mbps and the usual ADSL line is around 8 Mbps, we need to compress the video stream to transport it.

Typical bitrates

Below are typical audio & video bit rates after compression

Type Bit Rate Quality

Audio 8 Kbit/s Telephone

32 Kbit/s MW/AM radio

96 Kbit/s FM radio

256 – 320 Kbit/s Near CD

Video 2 Mbit/s Standard TV channel

8 Mbit/s HDTV

Figure 1: Typical bitrates

Lossy and lossless compression

A compression algorithm is a set of mathematical techniques to reduce this bitrate. Some of these techniques will not affect the image quality, similarly to what happens with a zip file. After opening the zip, your initial file is fully restored. This is called lossless compression.

On the contrary, other techniques will slightly affect the image quality, but should go unnoticed by the viewer, like in a JPEG picture. These techniques usually achieve a very good compression ratio. This is called lossy compression. Digital TV always uses lossy compression techniques as it is required to drastically reduce bitrate.

Group of pictures

One of the most efficient technique used in video compression consists in compressing raw images in different ways:

I-picture or I-frame (intra coded picture) reference picture corresponds to a fixed image and is independent of

other picture types. With an I-frame, the decoder has all the information to display the picture on the screen.

P-picture or P-frame (predictive coded picture) just contains the difference from the preceding I- or P-frame.

This is especially efficient for compressing a scene with similar pictures. The decoder needs to have the previous frame uncompressed to decode the current.

B-picture or B-frame (bidirectional predictive coded picture) contains difference information from the

preced-ing and followpreced-ing I- or P-frame. The decoder must first receive and uncompress the precedpreced-ing and the next picture to be able to uncompress the current B-frame.

(8)

Figure 2: The P-frame only contains the difference with preceding image

standard I-frames that previous and future frames will need for decoding

instantaneous decoding refresh (IDR) that only future frames will need for decoding. This is appropriate for

starting a new stream.

A group of pictures (GOP) is a set of pictures beginning with an I-frame. It may contains a various number of P-frames and B-frames. A compressed video stream consists of successive GOPs.

The GOP structure is often referred by two numbers, for example M=3, N=12. The first one provides the distance between two anchor frames (I or P). The second one gives the distance between two full images (I-frames), it is the GOP length. The more I-frames the compressed stream has, the more it is editable. However, having more I-frames increases the stream size.

Other techniques

We will not get into details into others techniques used by video compression. Just remember that because all these tools are mathematic tools, the more compression ratio we get, the more CPU power is required by the decoder to uncompress the video.

Some technical names: Quantization, zig-zag, Discrete Cosinus Transform, chroma subsampling, wavelets ...

Standard video codecs

These techniques can be combined when compressing video. Some standards define which techniques are to be used. These video algorithms are named codec (compressor-decompressor). The most common are:

H.262/MPEG2-Part2 (MPEG-2 Video) Used in DVD and digital television MPEG4-Part2 (MPEG4 Advanced Simple Profile) Used in DivX

H.264/MPEG4 AVC or MPEG4-Part10 (MPEG4 Advanced Video Coding) Used in Blu-ray and HD digital

television

Audio compression

Audio compression techniques are also used when transporting audio streams next to the video stream. Here are some common audio codecs:

• MPEG-1 Audio and MPEG-2 Audio layer III (MP3) • Advanced Audio Coding (AAC) (MPEG4 Part 3) • Dolby Digital AC3

(9)

Figure 3: OSI model applied to the IP stack

Encapsulation

In the OSI model, sending data from one point to another includes encapsulating the data at every layer adding the specific information to this layer in the form of a header. On the contrary, receiving data consists of decapsulating the data at every layer.

Figure 4: Encapsulation and decapsulation in the OSI model when transmitting data

Connected vs not connected protocols

At each level of the OSI model, a protocol can be: Not Connected:

(10)

• no assurance of packets arrival • no assurance about arrival order • ex: ethernet, IP, UDP (transport level) Or connected:

• acknowledgement • retransmission

• ex: TCP (transport level)

We can have connected protocols over not connected protocols and vice-versa. Not connected protocols are suitable for real-time applications with high traffic where retransmissions would cause too much latency. Connected protocols are suitable for control protocols where we can suffer some latency but can not suffer packet loss or packet shuffles.

Gateway

In this model, a gateway is defined as a device that converts from one stack to another, keeping the upper layers preserved. A gateway can convert between two different physical layers. A gateway is defined by the layer that is preserved.

A network switch is a gateway level 2. It will convert from one physical layer (one network cable) to another physical layer (another network cable), preserving the data link layer (ethernet).

A network router is a gateway level 3. It will convert from one data link layer (ethernet network) to another data link layer (another ethernet network), preserving the network layer (IP)

Figure 5: A gateway level 4 in OSI

IPTV Gateway

An IPTV gateway receives digital TV data over radiowaves and converts it to digital TV data over network cables. This is a level 7 gateway, Application layer of the OSI model.

(11)
(12)

Satellite Television

Digital satellite TV has been standardized by the Digital Video Broadcaster group, under the DVB-S and DVB-S2 standards.

Positioning

Digital satellite TV consists of sending data from the ground (uplink) and using the satellite as a reflector.

Figure 7: Uplink and downlink

The beam reflected to the ground covers a large geographic area, see for example the Coverage of the Hotbird 13◦ satellite in Figure 8.

Figure 8: Coverage of Hotbird 13satellite

Telecom satellites are located at a geostationary orbit, around 30 000 km above the equator. From the ground, all satellites on the geostationary orbit are located on an imaginary arc above the equator. This will be towards the south in the northern hemisphere, see Figure 10.

Elevation the up and down angle

Azimuth the East and West rotation angle

An orbital position in the sky is often the place for a satellite cluster where more than one satellite is sending different signals. For example when pointing a dish to azimuth 19.2◦, you will receive the signal from Astra 1H, Astra 1KR, Astra 1L, Astra 1M and Astra 2C.

(13)

Figure 9: Eutelsat satellites seen from space

Figure 10: Telecom satellites seen from the ground

(14)

Frequency range

The full radiowave spectrum (3 MHz to 300 GHz) is divided into named ranges. Below are the ranges that are used in television:

VHF band 30 MHz to 300 MHz (Terrestrial Analogic TV)

UHF band 300 MHz to 3000 MHz (Terrestrial Analogic and Digital TV) L band 950 MHz to 2150 MHz (Cable TV)

C band 3700 MHz to 4200 MHz (Some satellite TV) Ku band 10750 MHz to 12750 MHz (Most Satellite TV)

Figure 12: Frequency range (logarithmic scale)

As satellites are extremely costly, satellite broadcasters use a property of radiowaves called polarization. Each radiowave can be sent with a specific polarization, a specific angle. Using this property, satellite broadcasters send the radiowave with 2 different angles. They double the amount of data sent within a radio frequency. The two polarities used are Horizontal and Vertical.

Figure 13: Horizontal and Vertical polarization of a radiowave

Reception

The satellite dish uses its geometry to focus the incoming radiowave in Ku band into a reception device called an

LNB (Low Noise Block converter). At the output of the LNB, the signal is transferred over a coaxial cable, often

called an RG (Radio Guide) cable ending with F-connectors. These cables are able to transmit bidirectional data and power.

This cable will be plugged into a satellite receiver (PCI card for PC, home receiver, professional receiver ...) see Figure 15.

Signals can be transmitted on the RG cable using the L band.

As the satellite band Ku is around 2000 MHz wide and the L band is around 1000MHz wide, we can only translate half of the Ku band in one coaxial cable. Furthermore, the polarity cannot be used over the L band.

So the LNB will have to convert only one fourth of the incoming Ku band to one cable: • Horizontal polarity, low Ku band (10750 to 11700)

(15)

Figure 14: Satellite dish with a LNB

(16)

• Horizontal polarity, high Ku band (11700 to 12750) • Vertical, low

• Vertical, high

These four sections are called quadrants.

The conversion between Ku band to L band requires translation using offsets. For low band, we have: Lf requency(M Hz)= Kuf requency(M Hz)− 9750

For high band:

Lf requency(M Hz)= Kuf requency(M Hz)− 10600

Selection

Quadrant selection

The common LNB used in end consumer install is the single LNB. There is only one RG cable at the output. The satellite receiver powers and commands the LNB to select one of the four quadrants by changing its power supply during period of 15ms respecting these four rules:

• Vertical: 13V • Horizontal: 18V • Low: Direct Current

• High: Alternating Current with frequency 22kHz

Figure 16: Satellite receiver selecting Quadrant

Therefore a single LNB can only transmit one quadrant at a time. With an end consumer receiver this is generally not a problem as only one TV channel from one quadrant is watched simultaneously. With a professional receiver, this is a problem as we may want to receive at the TV channels from different quadrants simultaneously.

In this case we need to use a quattro LNB with 4 RG cables at the output. The quattro LNB is able to transmit each of the four quadrants inside each of the four cables.

Using a multiswitch with one to four quattro LNB allows to select each quadrant from various satellites.

Satellite receivers control multiswitches by either quadrant selection and sec tone burst, either DiSEqC (Digital Satellite Equipment Control) to control the quadrant selection and from which LNB to select from.

Sec Tone burst

Also named minisec, it allows to select between two satellites.

Before sending the message, the alternative current used for quadrant selection must be stopped during 15ms. Then a binary message is sent during 15 ms from the satellite receiver to the LNB or the multiswitch. The binary message is still sent at the voltage choosen by the satellite receiver.

Each bit is coded during 1.5ms:

(17)

Figure 17: Quattro LNB

Figure 18: A multiswitch with 16 inputs from four quattro LNBs

(18)

• 1: 0.5ms of 22kHz alternative current and 1 ms direct current With the following meaning:

• satellite A: Alternative current 22kHz during 12,5 ms

• satellite B: 9 bits “1”. Total duration: 8 × 1, 5 + 0, 5 = 12, 5 (We don’t take in account the last 1ms of direct current)

DiSEqC

Figure 20: DiSEqC

The protocol DiSEqC (Digital Satellite Equipment Control) has been written by Visiosat and Eutelsat. It is meant to replace quadrant selection and sec tone burst. Various versions of DiSEqC exist, all of them are backward compatible.

A DiSEqC message is a 54ms binary message that uses the same bit coding that sec tone burst but allows much more commands (quadrant selection, 4 satellites selection, cascading level ...). Like sec tone burst, it must be preceded by a 15 ms direct current slot. When possible, it is recommended to add the quadrant selection and the

sec tone burst corresponding messages for compatibility.

A full DiSEqC message with compatibility will be:

M essage = QuadrantSelection + 15msdirectcurrent+ 54msDiSEqC+ 15msdirectcurrent+ 12.5msminisec

Transponders and multiplexes

Definitions

A transponder is an electronic device on a satellite that receives a signal from an uplink, converts it to a new frequency and polarity, amplifies it, and sends it back to Earth. Satellites are equipped with a variable number of transponders.

A multiplex is a group of channels broadcast together over one frequency. At the end user side, typically at the decoder level, these channels are split. This is the de-multiplexing process which allocates a separate ID to each channel on the TV set. The number of channels carried over one multiplex is very variable (from 1 to 30).

Each transponder conveys one multiplex, explaining why the two words are often interchanged. Generally, in satellite television we speak of transponders while with terrestrial television we speak of multiplex.

Each transponder is emitting over a frequency and a polarity. To receive data from a transponder, a receiver must

tune to this frequency/polarity. Lineup

The word lineup is used to describe the list of the channels that we will broadcast on the IPTV side.

To know the list of the available transponders, theirs settings and the list of channels they convey for many satellites fleets, we strongly suggest using the web site www.kingofsat.net, see Figure 21.

(19)

Figure 21: Kingofsat showing one transponder from Astra 1KR (19.2E)

Keep in mind that a satellite receiver usually includes one tuner only. Each receiver can therefore only receive one transponder. For a Flamingo 660S with 6 satellite inputs, you can only receive 6 transponders simultaneously.

Signal strength

Signal strength is measured in decibels microvolt (dBµV), it is the ratio in dB of the voltage received comparing to one micro Volt.

Other unit for signal strength is decibel milliwatt (dBm). When using RG cables with impedance 75 Ω, we have the following correspondance:

SignalStrength(dbµV )= SignalStrength(dBm)+ 109

All the transponders can be seen with a spectrum analyser as the signal strength is higher for the transponder frequency. The Figure 22 shows the spectrum analysis for Horizontal polarity for a satellite. The X-axis is formed by the frequencies on L-band, red line on 1508 MHz. The Y-axis shows the signal strength, expressed in dBµV). Each peak on the graph is a transponder.

Figure 22: Five transponders seen with a spectrum analyser (source: Promax documentation)

(20)

Demodulation

As we are using sinusoidal radiowaves (Ku-band and then L-band) we can only transmit analog signals.

If we want to transmit digital TV (zeroes and ones), we need a way to send binary data with sinusoidal radiowaves. This is called demodulation.

QPSK

There are various type of demodulation. The one used for DVB-S is a phase-shift modulation called Quadrature

phase-shift keying (QPSK). It consists of defining 4 phases-shifts in the sinusoidal by comparison to a carrier

radiowave and associate them with 4 symbols. We associate each symbol with 2 bits, see Figure 23.

Figure 23: The 4 symbols for quadrature phase-shift keying, in comparison to a carrier

8PSK

The one used in DVB-S2 is also a phase-shift modulation and is named 8 Phase-Shift Keying (8PSK). It defines 8 phases-shifts for 8 symbols, each symbol coding for 3 bits.

Symbol rate

Each transponder sends the sinusoidal with a different speed, defining a different symbol rate. It is expressed in

kilo-symbols/seconds (kS/s) also named kilo-Baud (kBd). Typical values are 27000 kS/s, 30000 kS/s ...

Some receivers need to know the symbol rate used by the transponder. Others will guess it.

The bitrate is given by the symbol rate multiplied by the number of bits per symbol (2 bits for QPSK, 3 bits for 8PSK). A transponder emitting at 27000 kSymbol/s using QPSK will therefore transmit at 54 000 kbps (kilobit per second).

However the available bitrate for DVB-S is lower because of the error correction mechanisms, see below.

Error Correction

Because satellite communication does not allow exchange, it is impossible for a receiver to ask for new data if corruption occured. Because Ku radiowaves go through the entire atmosphere, they may get very perturbed To maximize the chances of receiving a correct transmission, DVB-S standards include error correction that consists of sending redundant data. The redundancy allows the receiver to detect a limited number of errors that may occur anywhere in the message, and often to correct these errors without retransmission.

There are two error corrections in DVB-S and DVB-S2:

• Viterbi correction, or FEC (Forward Error Correction). This is appropriate for continuous streams. This is dependant from the transponder and can be seen on www.kingofsat.net, see Figure 21. Usual FEC rate is 2/3, meaning that for each 2 bits of data sent, an extra redundant bit is sent.

(21)

For DVB-S2, it is a bit more complex because of some additional headers.

Bit error rate

The number of bit errors is the number of received bits of a data streaming over a communications channel that have been altered due to noise, interference, distortion or bit synchronization errors.

The bit error rate(BER) is the number of bit errors divided by the total number of transferred bits during a studied time interval. BER is a unitless performance measures. We distinguish two BER:

CBER Bit Error Rate before error correction, as received by the receiver VBER Bit Error Rate after error correction, the one that is really used

Measuring signal

Bad reception is the most common problem found when running an IPTV head-end.

Figure 24: Good reception is the key to successful installs !

It is therefore mandatory to measure the signal with a professional digital measuring tool such as the Promax TV Explorer (Figure 25). These kind of tools give you the signal strength but also the CBER an VBER. Some of them can even display the decoded video.

Anevia devices require a minimal signal quality, for both signal strength and bit error rate. • Power > 60 dBµV

• CBER < 1.0×10−3

(22)

Figure 25: Promax TVExplorer 2 signal measuring tool

(23)

multiplexes and there are still very few DVB-T2 systems deployed.

Reception

To receive a DVB-T signal, you need a directionnal antenna. DVB-T transmitters are spread across the territory and one should point the antenna to the closest transmitter

Figure 27: Antenna for receiving DVB-T signals

Frequencies

DVB-T uses the UHF band (300 MHz to 3000 MHz).

Contrary to what happens in DVB-S, the base frequencies to use within the range are fixed and are given by this formula:

F requency(M Hz)= 306 + U HFchannel× 8

with UHF channel within range 21 to 68 (frequencies from 470 to 860 MHz).

To avoid conflicts between transmitters, they emit on the base frequency -/+ an offset depending on their geographic location. Offset used are 0.166 and 0.322 MHz, see Figure 28.

(24)

DVB-T usually includes various multiplexes emitting over different frequencies, see as an example the list of multiplexes broadcasted by Eiffel Tower, Paris in Figure 29.

Name UHF Chan. Freq (MHz) Channels list

R3 22 482.166 Canal+, Canal+ Cinéma, Canal+ Sport, Planète, TPS Star R2 25 506.166 BFM TV, Direct 8, DirectStar, France 4, Gulli, i>TELE

R5 28 530.166 TF1 HD, France2 HD, M6 HD

R4 30 546.166 M6, W9, NT1, Paris Première, Arte HD

R6 32 562.166 TF1, NRJ12, LCI, Eurosport, TF6, TMC, ARTE R7 33 570.166 Canal 21, IDF1, NRJ Paris, BFM Business R1 35 586.166 France 2, France 5, France Ô, LCP, France 3

Figure 29: List of multiplexes broadcasted by Eiffel Tower, Paris

Terrestrial Demodulation

Orthogonal Frequency Division Multiplexing (OFDM) is a frequency-division multiplexing scheme used as a

digital multi-carrier modulation method. A large number of closely-spaced sub-carriers are used to carry data. Each sub-carrier is modulated with a conventional modulation scheme at a low symbol rate, maintaining total data rates similar to conventional single-carrier modulation schemes in the same bandwidth.

Quadrature amplitude modulation (QAM) is a modulation scheme where amplitude and phase are shifted with

regards to the reference carrier. The number of symbols may vary (8, 16, 64).

Figure 30: QAM 8 symbols coding for 3 bits

Both OFDM and QAM are used in DVB-T as each sub-carrier from OFDM is modulated with QAM. Some countries use QAM with 16 symbols (each symbol coding for 4 bits) while others use QAM with 64 symbols (each symbol coding for 6 bits). DVB-T also use FEC with typical rates 2/3 and 3/4.

(25)
(26)

MPEG2 Transport Stream

Note about MPEG

MPEG stands for Motion Picture Expert Group. It is a working group of experts that was formed by ISO and IEC to set standards for audio and video compression and transmission.

The group publishes recommendations. It tends to group their recommendations under one name. MPEG2 was published in 1995, MPEG4 in 1998. Each MPEG version include many things, including file containers, video codecs and audio codecs.

Type MPEG2 MPEG4

File Container MPEG2-TS (streaming)

MPEG2-PS (DVD)

MPEG4-part14 (.MP4) Video compression MPEG2-video (H.262) MPEG4-part10 (AVC/h.264)

MPEG4-part2 (DivX)

Audio compression MPEG2-audio MPEG4-part3 (AAC)

Total 11 standards 27 standards

Figure 31: MPEG2 and MPEG4: many standards

All DVB digital broadcasting uses the MPEG2 Transport Stream (MPEG2 TS) for transport. They may use various video codec and various audio codec, but they will always stream the data over MPEG2-TS.

Please be careful when using the word MPEG. Try to be more precise as it may get very confusing. When speaking about MPEG4 TV, we should specify that we are talking about video codec MPEG4 (AVC/h.264) transported over MPEG2 Transport Stream

MPEG2-TS in OSI model

After seeing how we can receive DVB-S or DVB-T radiowaves, and how demodulate this into a binary stream, we will now see how we can transport audio and video. This will be using the MPEG Transport Stream protocol. Transport Stream is specified in MPEG-2 Part-1-Systems, also known as ISO/IEC standard 13818-1 or ITU-T Rec. H.222.0.

MPEG2-TS is at the application layer from the OSI model, on level 7. It can be transported over a modulated signal on radiowaves or on a twisted pair with the IP stack, see Figure 6.

MPEG2-TS: time multiplexing

The name Elementary Stream (ES) is given to a track, may it be audio, video, subtitles or anything else.

A multiplex being some channels, with each of them containing some ES (typically one video, one or two audio and one or two subtitles), we have to transport them into one single bitstream, emitted on one frequency.

MPEG2-TS will multiplex the ES by dividing them in small chunks and transmit each ES alternatively over the time, hence the name time multiplexing.

In Figure 32, we see how we multiplex one video ES with a high bitrate, one audio ES with a smaller bitrate and an even more smaller subtitle ES. We could get the same result with much more ES as it happens on DVB-S or DVB-T multiplexes.

(27)

MPEG2-TS as a file container

The MPEG2-TS has been designed as a streaming format. It can also be used as a file storage format. In this case, it is just a concatenation on the disk of the TS packets in one single file. The file extension used is usually .ts or

.mpg.

MPEG2-TS: Implementation details

The protocol MPEG2-TS divides the ES in a fix size chunk of 188 bytes: the TS packet. Each TS packet has a TS header containing:

• a start byte 0x47 (also named sync byte) • a PID, that is the numerical identifier of the ES

• a Continuity Counter which is a number incrementing at each TS packet, cycling between 0 and 15, to detect packet lost or order mismatch. The continuity counter is just an indicator that TS packets have been lost or shuffled, there is no way to recover them

• a scrambling indicator indicating if this TS packet is a scrambled one • optional additional headers (Adaptation field, PES extension) containing:

– info about the ES (beginning of a video frame for example)

– timestamps (Pulse Clock Reference, Decoding Time Stamp and Presentation Time Stamp) sent to the

decoder to be able to decode the ES synced with the other ES

Figure 33: The details of one TS packet with Adaptation field seen with Wireshark

(28)

Because viewers may choose from multiple programs on a single transport stream, a decoder must be able to quickly sort and access video, audio and data for the various programs. Program Specific Information (PSI) act as a table of contents for the transport stream, providing the decoder with the data it needs to find each program and present it to the viewer.

PSI table Meaning PID hex/dec

Program Association Table (PAT)

A root directory for the transport stream, the table pro-vides the PID value for the packets containing the PMT associated with each program.

0x0000/0

Conditional Access Table (CAT)

This table provides the PID value for the packets containing each Entitlement Management Message (EMM), see Keys update

0x0001/1

Program Map Table (PMT) The PMT lists the PID values for the video, audio, clock reference and data components ES of one channel. It also lists the PID value for each Entitlement Control Message (ECM) in the stream, see Keys update

Read from PAT

Figure 34: PSI tables

PSI tables help the decoder locate audio and video for each program in the transport stream and verify Conditional Access (CA) rights. The tables are repeated frequently (for example, 10 times/second) in the stream. See Figure 34 for the list of the PSI tables.

A decoder receiving a Transport Stream must read the PID 0x0000 in order to know the list of channels, and then read each PMT to know the video and audio ES for each channel as in Figure 35.

Figure 35: Decoder reading PAT and PMT

SI tables

SI (Service Information) tables are defined by DVB standards and are also knowed as DVB tables.

They give service providers the necessary tools to add package information and meta-data to the Transport Stream. These tables are added to the MPEG-2 transport stream during encoding or multiplexing.

(29)

Figure 36: The most common SI tables

Multi Program or Single Program

DVB-S and DVB-T multiplexes transmit more than one channel. The Transport Stream is a MPTS (Multi Program

Transport Stream). In this case, the PAT links to more than one PMT. But we can imagine an SPTS (Single Program Transport Stream) where the PAT would only link to one PMT. This is the common case in IPTV.

In Figure 37, we compare the hierarchy of an MPTS with SI tables and a SPTS without any SI tables.

(30)

Descrambling

Many broadcasters, especially via satellite, will scramble their content to prevent that people watch the channel for free. This is called conditional access (CA). To descramble you will need a CAM and a valid smartcard.

Others channels are broadcasted descrambled or Free To Air (FTA).

Conditional Access Module

A Conditional Access Module (CAM) is a small computer responsible for descrambling scrambled channels. It contains CPU, RAM, a real time Operating System and a DVB-CSA Descrambler chip (see below).

Figure 38: A Conditional Access Module

It has a PCMCIA connector that allows it to connect into a PCMCIA slot. The protocol to communicate between the host and the CAM is called Common Interface (CI). The other side of the CAM is a smartcard reader as shown in Figure 39.

Figure 39: A CAM inserted in CI slot of a host, and receiving a smartcard

CAM are made by CAM manufacturers: Aston, Smardtv, Mascom...

They design the hardware, install the OS, take care of MPEG2-TS transmission and user interface (menus and popups). Then they will include software libraries from the Conditional Access System (CAS) supplier.

Conditional Access System

A Conditional Access System (CAS) supplier designs the security system, the smartcards and the software libraries delivered in the CAM. They also provide the keys sent to broadcasters for the source scrambling.

Here are some major CAS suppliers: Nagravision, Viaccess, Irdeto, Cryptoworks, Conax...

All the scrambled channels are scrambled using the same algorithm named Common Scrambling Algorithm

(DVB-CSA). This is a strong cyphering, optimized for hardware processing. The different CAS suppliers will implement

this common algorithm, but will change the way to manage and cypher the CSA keys. The CSA keys are broadcasted and therefore need to be cyphered.

(31)

The CAM will send back to the host the full multiplex Transport Stream but with the descrambled PID (TS OUT in Figure 40 with red lines for scrambled PID and green lines for descrambled ones).

Figure 40: Dialogs between the host and the CAM

Keys update

Entitlement Control Messages (ECM) are sent in the TS stream. They contain the CSA descrambling keys or Encrypted Control Words. To decrypt these Control Words, the CAM will use the smartcard information. These

control words are changed every 10 seconds. The smartcard contains:

• a key to decrypt the current month ECM • the package info (optional channels) • a PIN code

Smartcard information is updated every month through another broadcasted message: the Entitlement

Manage-ment Message (EMM) or monthly keys. These EMM messages can be sent to all smartcards, to a group of

smartcards or to an individual smartcard. In this case the message is broadcast, but ignored by all smartcards except the targeted one.

When the smartcard is not connected to the transponder where EMM are broadcasted during the EMM broadcast (generally at the end of the month), the smartcard will not be able to decrypt the following months ECM. In this case, you can call your service provider, it will generate a targeted EMM to your smartcard with rights refresh.

How to choose a CAM

It is hard to find information about CAM and smartcard compatibility.

The best case scenario is where one CAM descrambles 12 programs / 24 ES but this is rarely the case. Most CAMs will have a hard-coded limitation or a hardware limitation that will prevent from descrambling more than X programs / ES.

Anevia support collects the experiences of its customer when trying to descramble packages. When trying a new package, ask Anevia before buying a CAM as we may give you some suggestions. If Anevia does not have any experience on a package, find the CAS used by the smartcard and then try all the possible CAM implementing this CAS.

(32)

IP networks

Back to our Figure of an IPTV gateway, we saw how to receive DVB-S/DVB-T multiplexed stream, how to demodulate it to obtain a binary stream, how to demultiplex it to extract the interesting ES (audio and video) and how to descramble them with a CAM. We will now see how to send the result to the IP side, using network protocols.

Figure 41: A DVB-S to IPTV gateway in OSI

The MPEG2-TS packets need to be sent to the IP network. On the IP side, as the bandwith is limited, we will often send the SPTS of one TV channel only into one IP stream.

As the TS packets are 188 bytes long and the maximum size for an IP packet is usually 1500 bytes, we can include up to 7 TS packets in 1 UDP/IP/Ethernet packet.

188 × 7 = 1316(< 1500)

Figure 42: An IPTV packet showing all layers of encapsulation

RTP

Real-time Transport Protocol (RTP) may be used optionally for transporting MPEG2-TS packets. The interest

of RTP is that it provides a sequence number over 16 bits (0 to 65536) that adds a new layer of packet loss detection (similar to Continuity Counter in MPEG2-TS).

Each RTP packet contains 7 MPEG2-TS packets and is included into one UDP packet.

UDP

User Datagram Protocol (UDP) is a Transport protocol, level 4 of the OSI model. It is a non-connected protocol

(33)

Figure 43: The UDP header

It defines a source port and destination port which must be seen as addresses. The UDP destination port for IPTV is often 1234. In Figure 43, the Application data (message) part is the packets coming from upper layer (either 7 MPEG2-TS packet, either 1 RTP packet containing 7 MPEG2-TS).

IP

Internet Protocol v4 (IP) is a Network layer protocol (level 3).

Figure 44: The IP header

In the Data field of figure 44, it will contain the full packet from the upper layer. In our case, 7 MPEG2-TS within 1 UDP packet.

IP defines a source address and a destination address which are 32-bits (4 bytes) addresses often expressed as a decimal form like X.X.X.X with each X between 0 and 255.

Multicast

Network communications can be of three types:

unicast two hosts are communicating directly

broadcast packets emitted from one host towards all hosts in the same local network multicast one emitter will emit packets to a group of registered hosts

Multicast as we know it is implemented at the Network layer by the IP protocol.

If the destination address of an IP packet is between 224.0.0.1 and 239.255.255.255, it is a multicast destination or

multicast group. Each host from the network can subscribe or unsubscribe from the group by using the Internet Group Management Protocol (IGMP) protocol.

(34)

Membership Report Hosts that are interested in joining a specific multicast group send out IGMP membership

reports for that particular group. Reports are sent when first subscribing to the group (Report are then called

IGMP Join) and also replying at each Membership Queries

Membership Query the IGMP querier periodically sends IGMP membership queries to find out if there are still

hosts interested in receiving traffic from a particular multicast group. Hosts must answer with Reports if still interested. If not, then traffic towards not interested local segments is stopped.

Leave group hosts are able to send their intention to leave the group to their local multicast router so that

unnec-essary traffic is stopped right away

The network switch will maintain a table of the hosts that subscribed to the group. Each time a packet is sent to the group, the network switch will replicate it to each registered host. In Figure 45, the red host emits a multicast stream to a multicast group. The network switch M will only copy the packets to the registered hosts in green.

Figure 45: An host emitting to a multicast group (Source: Wikimedia Commons)

Multicast is only able to transport non-connected protocols. It is particularly suitable for live IPTV streaming because it will save the overall bandwith.

Multicast requires a multicast enabled network. This implies that:

• all the switches are IGMP capable, sometimes refered as IGMP snooping,

• a master switch will play the role of IGMP querier. It is the only switch/router to send IGMP Membership Queries

This is often difficult to have multicast over the Internet, thus explaining the success of OTT, see Over-The-Top TV.

Ethernet

Ethernet is a protocol from the datalink layer (level 2).

The ethernet header defines a source ethernet address and a destination ethernet address (also known as MAC

addresses). These are 48 bits (6 bytes) often expressed in hexadecimal form like XX:XX:XX:XX:XX:XX. Each

ethernet card has a fixed ethernet address given by the manufacturer. The first 3 bytes are a manufacturer identifier. The last 3 bytes are unique for each card from the same manufacturer, making each ethernet card unique.

The ethernet destination address of an IP multicast packet will be like 01:00:5E:xx:xx:xx.

The ethernet packet presented in Figure 46 will include the packet form the upper layer in the field Payload, that is the 7 TS over UDP over IP.

(35)
(36)

RTSP

Real Time Streaming Protocol (RTSP) is a Video-On-Demand (VOD) control protocol from the Application

layer of the OSI model (level 7).

RTSP is not used to transport the audio and video, it is only used to control the streaming which is transported in another way. Typically, the RTSP protocol will be used to start, pause, play again and stop a Video On Demand streaming transported over TS/UDP.

As explained in Connected vs not connected protocols, the RTSP requires an underlying connected protocol and uses TCP.

Figure 47: RTSP is at the Application layer from the OSI model, on level 7

TCP

Transport Control Protocol (TCP) is a connected protocol located at the Transport layer of the OSI model (level

4). It is used over IP. It is used for transporting many protocols like HTTP, FTP, SSH, SMTP, RTSP ...

Figure 48: TCP and IP headers

The TCP header, Figure 48, includes a source port and a destination port that can be seen as addresses. It also includes some flags that can be on or off for each packet:

SYN synchronisation ACK acknowledge

PSH push: packet contains data

FIN end connection gently (wait for ACK) RST reset: end connection (no ACK)

(37)

Figure 49: The TCP three way handshake

RTSP details

RTSP is a client-server text protocol influenced by HTTP. The default TCP port for a RTSP server is 554. Once the TCP connection is opened, the client sends RTSP requests, the server replies with RTSP answers. In very rare cases the server can notify the client with an event.

Requests

Requests are formed of:

• the request type: OPTIONS, DESCRIBE, SETUP ... see Figure 50 • the URL of the stream to play:rtsp://toucan/disk/stream.ts • the version of the RTSP protocol: RTSP/1.0

• a carriage return \r\n

• some headers, one per line: Transport, Session, Scale ... • two carriage returns \r\n \r\n

Request Meaning Typical Headers

OPTIONS The client asks for the list of the supported re-quests

DESCRIBE The client asks for information about the stream provided in the URL

SETUP The client asks for a session number Transport, x-playNow, x-mayNotify

PLAY Start the streaming at speed Scale starting from Range

Session, Scale, Range

PAUSE Stop the streaming Session, Scale

GET_PARAMETER Ask for a status Session

TEARDOWN End the session Session

ANNOUNCE The server notifies the client of an event like such as of stream reached

Session

Figure 50: The most common RTSP requests

Headers

(38)

Header Meaning Example

Session Session number provided by the server to the client

Session: EEFLE20564111871950006

Transport A line describing the kind and destination of the stream

Transport:

RAW/RTP/UDP;mode=”PLAY”; unicast;destination=10.0.0.1; client_port=1234

Scale The speed at which the file must be played, can be negative

Scale: -2.0

Range In the request, gives the start of the play. In the answer, gives information about the current position

Range:

npt=0.000-x-playNow Start the play directly, without waiting for the PLAY request

x-playNow:

x-mayNotify Allows the server to send ANNOUNCE events x-mayNotify:

Content-length If 0, no message is included. If not 0, means that a message is carried. This is the case when answering to DESCRIBE

User-Agent A description of the client User-Agent: Amino

Server A description of the server Server: AneviaManager2

Figure 51: The most common RTSP headers

Answers

Answers are formed of:

• the version of the RTSP protocol: RTSP/1.0 • a status code: 200, 404, 500 ...

• a status message: OK, File Not Found, Internal Server Error • a carriage return \r\n

• some headers, one per line: Transport, Session, Scale ... • two carriage returns \r\n \r\n

The status codes are using the same categories that in HTTP: 2XX for OK status, 4XX for invalid requests, 5XX for server errors.

Example

In Figure 52, we see how to start a stream in RTSP. The client begins with a SETUP command, indicating in the

Transport header what kind of stream it expects. Here the client is asking for RAW/RAW/UDP, which means

MPEG2-TS over UDP (no RTP). It requires the stream to be sent to destination IP 239.36.95.76 on UDP port

1234.

The server answers with status 200 OK, meaning that the request is successful, confirming the stream type and destination in the Transport header and answering with a Session number.

From that time, this session name must be sent into a Session header in every client request.

Then the client asks to start the streaming with a PLAY request, with the Session header and the Scale header set to 1, meaning that we want to do a normal play, not a fast forward. The server answers with 200 OK, confirming the request. The Range header gives information about the current position in the file and the total length of the file. Here the file is 120 seconds long, and the play just begun so the current position is 0.

After this exchange, the streaming server starts streaming the stream in TS/UDP to the agreed IP address. After this session start, the client would probably send regular GET_PARAMETER requests. If the user does some

trickplay (fast rewind, slow motion, fast forward, seek), some PLAY requests will be issued. Finally when the film

(39)
(40)

Over-The-Top TV

Bringing IPTV to users requires a managed network: working multicast and constant bandwith. This is something hard to achieve outside of telco networks or local area networks. This is the reason for the emerging of

Over-The-Top TV (OTT) which enables providers to deliver live TV and VOD through the Internet. Any connected device

can then watch TV: PC, connected TV, smartphones, tablets, ... There are various protocols to deliver OTT:

• Apple HTTP Live Streaming (HLS) • Microsoft Smooth Streaming

• Adobe HTTP Dynamic Streaming (HDS) • MPEG DASH

Principles

The Over-The-Top formats have some common principles: • use HTTP over TCP

• adaptive streaming

• short video fragments (chunks) downloaded via independent HTTP requests • streaming enabled file formats (MPEG2-TS or MPEG4-part12 plus extensions)

The streaming server hosts a set of chunks with different bitrates (usually video compressed using various resolutions). It also hosts a playlist listing the URL of the chunks.

The client downloads the playlist, and then downloads the first chunk of a quality. If it is not able to download it fast enough (download time superior to the duration of the chunk), it usually switches to a lower quality. If the download time is largely inferior to the duration of the chunk, it may switch to highest quality. This behaviour depends of the client.

Figure 53: Over-The-Top streaming example

In case of live, the playlist must be refreshed regularly. A playlist provides an available live window of a fixed duration, usually 30 seconds. In case of VOD, a single playlist will list all the chunks for a single movie.

In the example Figure 53, the client started by downloading the playlist, then downloaded the first chunk of the highest quality (1.5 Mbps). Since the download time was too long, it switched to the lower quality (1 Mbps) and then switched again to the lowest (0.5 Mbps)

(41)

Pros:

• it is unicast, so there is no multicast routing issue, which means it can be used over an uncontrolled network, especially over the Internet and wifi.

• contrary to what happens with RTSP and unicast UDP streaming, there is one single TCP connection for both control and download. This avoid many firewall and NAT issues

• the client can adapt the bitrate and can still see the video even with a bad connection • the protocols allow video in "live" and "VOD" mode

• HTTP is a well-known and powerful protocol. The ecosystem is strong. There are some good solutions to cache it. HTTPS can be used for security

Cons:

• it is unicast, so there is no way to "broadcast" the live TV. Each client will create a new TCP connection. This requires powerful servers and smart caching

• you have to encode or re-encode each stream several times to have multi-bitrates • not easy to do fast forward with adaptive streaming.

To understand the various OTT formats, we will first have a deeper look at two standards that are used by many OTT protocols: HTTP and the MPEG4 file formats. We will then review the various OTT protocols.

HTTP

Hyper Text Transfer Protocol (HTTP) is a layer 7 (application) defined by the IETF.

It is a request/answer protocol: the client makes a request over an Internet resource (URL), the server answers with a status code and the requested file if needed. HTTP requires a connected transport protocol (TCP). See TCP. The default TCP port for HTTP is 80.

HTTP is very similar to RTSP. In fact RTSP was inspired by HTTP.

Requests

The requests are of two types:

GET download a file from a URL POST send a file, data, to a URL

Other kinds exist but are rarely used (HEAD, PUT ...). Requests begin by a request line:

• the request type: GET or POST • the requested path: /path/to/file.ts

• the version of the HTTP protocol: HTTP/1.0 or HTTP/1.1 and then:

• some headers, one per line. Example: User-Agent: Chrome • an empty line

• and file/data if this is a POST request. Can be of any type (text, binary). Example This is a 33 bytes long

(42)

Answers

Answers are formed of a status line:

• the version of the HTTP protocol: HTTP/1.0 or HTTP/1.1 • a 3-digits status code: 200, 404, 500 ...

• a status message: OK, File Not Found, Internal Server Error and then

• some headers, one per line: Content-Length, Content-Type ... • an empty line

• file/data. Can be of any type (text, binary). Example: This is a 33 bytes long text file The status codes are sorted by the first digit:

• 1XX for temporary answers • 2XX for OK status

• 3XX for redirections • 4XX for invalid requests • 5XX for server errors

Headers

Find in Figure 54 some useful HTTP headers, used in both requests and answers

Header Meaning Example

Transfer-Encoding Indicates that the file is split in chunks that will be sent in several requests

Transfer-Encoding: chunked

Content-Length If omitted or 0, no file is included. If not 0, tells the size in bytes of the file carried.

Content-Length: 59

Content-Type Tells the file type (known as mime-type) Content-Type: text/html

User-Agent A description of the client User-Agent: Chrome 3.1

Server A description of the server Server: Apache2

Cache-Control for dynamic content, tells the server how long it can keep a copy of the file

Cache-Control: None

Authorization Gives an encoded user and password for authen-tication

Authorization: Basic QWxhZ==

Range Allows a client to request only a part of a content Range: bytes=500-600

Figure 54: Some useful HTTP headers

Example Request: GET /live/disk1/mezz/DASH/dash/DASH-audio_eng=96000-video=800000-4991.ts HTTP/1.1\r\n Accept-Encoding: identity\r\n Host: 172.27.115.90\r\n Connection: close\r\n User-Agent: Python-urllib/2.6\r\n \r\n Answer:

(43)

Accept-Ranges: bytes\r\n Connection: close\r\n

Content-Type: video/MP2T\r\n \r\n

G@...AW...AW. ..0... [truncated]

MPEG4 file formats

ISO Base Media File format

The ISO Base Media File format described in MPEG4-part12 (ISO/IEC 14496-12) is designed as an extensible file format.

Tracks are maintained in a hierarchical data structure consisting of objects called atoms or boxes. Each box has a 4-letter type. The resulting file is composed only of boxes. A box may contain other boxes.

The file format supports streaming of media data over a network as well as local playback. The format describes many boxes, here are some of them with their relative positions:

ftyp at the beginning of the file, describes the extension used moov describes a full movie

mvhd movie header, overall declarations trak container for an individual track or stream

trhk track header, overall information about the track moof describes a movie fragment

mfhd movie fragment header traf track fragment

mfra movie fragment random access

mdat container box which holds actual media data meta metadata

It allows to store single-track or multi-tracks content. These tracks may be pre-fragmented using the moof box. To have a streaming enabled file, there must be a specific track hint.

MP4, Adobe F4V and Microsoft PIFF (ISMV) are extensions of the ISO Base Media File format to store video

or audio tracks plus specific DRM.

MP4

The MP4 file format MPEG4-part14 (ISO/IEC 14496-14) defines some extensions over ISO base media file format, especially to improve streaming capabilities. It introduces the IOD (Initial Object Descriptor) box.

(44)

Figure 55: A MPEG4-part12 streaming enabled file, not fragmented

(45)

The second objective is to define a DRM system. This is done by adding the Protection System Specific Header and SampleEncryptionBox box. The 4-letters identifiers of these boxes are both uuid. This is also done by defining the scrambling and descrambling algorithm based on AES.

(46)

HLS

HTTP Live Streaming (HLS) is an adaptive streaming protocol originally designed by Apple but now proposed as

an RFC at the IETF.

At the time of writing, the version 4 of the protocol is documented in the draft 8. The terms used in HLS are:

Variant playlist for the main playlist. We sometime finds the term meta-playlist Segments for the video chunks

Variants for the available qualities/bitrate, each described in an single playlist

The RFC uses the term Variant playlist for the main playlist, and does not recommend a term for each playlist. Some people will use Variant playlists for each bitrate playlist which is in contradiction with the RFC.

Segments

HLS segments have the following characteristics: • a single program MPEG2 Transport Stream file • video codec is h.264 (MPEG4-part10)

• the audio track is included into the same MPEG2-TS, no codec is preferred • typical length is 10 seconds.

• it must starts with a PAT and a PMT

There is an exception, which is multiple audio, see below.

Variant playlist

The client must start playing HLS by reading the main playlist, known as the Variant playlist or meta-playlist. This playlist is a m3u file, a text file format invented for the Winamp media player. As the characters are encoded in UTF-8, the file format has been renamed to m3u8.

Lines with a specific meaning (tags) start with a hash (#).

The playlist begins with a #EXTM3U tag. For each variant, it will display the bandwidth with the tag

#EXT-X-STREAM-INF and a link to a bitrate playlist.

Segment playlists

Each quality/resolution/bitrate is described into an individual playlist. Each playlist lists the segments URL with additional metadata contained in tags:

#EXT-X-TARGETDURATION specifies the maximum segment duration for this playlist #EXTINF specifies the duration and the title of a segment

#EXT-X-MEDIA-SEQUENCE tells what is the first fragment of the playlist

#EXT-ENDLIST closing tag. Difference between VOD / live. If the variant playlist does not end with this tag, it

means it is live. In this case, the client must refresh the playlist regularly (typically 10 seconds).

#EXT-X-KEY (optional) describes the DRM applied on the segments. HLS only supports a single encryption:

(47)

Figure 58: HLS playlists and chunks

Multiple audio

Since version 4 of the protocol, HLS supports multiple audio tracks. In this case, the variant playlist links to some audio-only playlists and to some video-only playlists. The audio playlists point to audio only segments, which are mpeg2-audio elementary streams, no encapsulation. The video-only chunks will be single program MPEG2-TS but without audio. Audio segments and video segments must have the same duration.

The player will download one audio segment and one video segment and will have to play them synchroneously. This is a better use of the available disk space and bandwidth since one and only one audio is streamed. This is just a possibility of the protocol, this use is not mandatory

HLS scrambling

Digital Rights Management

OTT scrambling are symmetric algorithms, meaning that the same key is used for scrambling and unscrambling. If an attacker can get the keys, it will be able to unscramble. The key point of the DRM providers is therefore to make sure the keys are stored and transmitted safely.

To avoid brute force attacks, it is recommended to have a regular key rotation.

The HLS standard describes how to cypher the HLS chunks with a 128 bits long key using the AES algorithm. Verimatrix and Nagravision are companies that sells key servers that can deliver AES-128bits keys and therefore can be used for HLS

Example with Verimatrix key server

You will find in Figure 59 an example where the packager implements on-the-fly scrambling on HLS with key rotation, by requesting a Verimatrix key server.

(48)

1. an HTTP request is done to the Verimatrix server to tell that a new asset will need encryption. Each asset (live TV, VOD asset) has a specific ressource, 1 in our example. Verimatrix precomputes an infinity of keys for this asset.

2. an origin/packager server is configured for a live channel. Here the channel name is Bloomberg, the resource ID is 1, its window length (duration of the playlist) is 60 seconds, key rotation period is 30 seconds.

3. the player requests the Segments playlist.

4. The packager requests all the keys needed to build this playlist to the key server and store them. In the HTTP request to request the keys, it will provide the resource ID of the channel with r=1, the type of asset with

t=DTV (DTV=live, VOD=Video On Demand) and a unique identifier to implement key rotation p=130000.

For the p= parameter, we often use the unix timestamp (number of seconds since 1970) of the first segment that will use this key. For VOD, we simply increment a counter with the duration elapsed.

5. The resulting playlist is sent to the client. It contains 6 segments names and the URLs for 2 keys.

6. Since the player already knows the URL for the next two keys, it can start downloading them from the Verimatrix server. The Verimatrix server may add HTTPS and authentication at this step for security.

7. The player requests the first segment

8. The packager scrambles the segment and sends it to the player

Same key is used for the second and third segments. For the fourth one, the key linked to p=130030 will be used.

(49)

Fragments are the name of the chunks Qualities are the various availables bitrates

Fragments

In Smooth Streaming a fragment is defined in the PIFF specification (see PIFF/ISMV. This is one moof and one

mdat boxes. It has a typical length of 2 seconds.

It contains either:

• an audio track. Accepted codecs are: PCM, WMA (Std, Pro), MPEG1/2 layer3 (MP3), AAC • a video track in h264 (VC1 is also accepted, but with low use)

• a subtitle track in ttml/dfxp format

Manifest

The client starts downloading the manifest which is an XML file. In this manifest, the client will be able to build the URLs of all the fragments. This manifest can refer to various audio and video tracks. The client will be able to switch between these tracks.

Then the client will download the appropriate chunks.

Each chunk is uniquely identified by its URL path (controlling the quality) and its name which is a timestamp (controling the position).

The first fragment in the Manifest has a time (t) and a duration (d). To obtain the timestamp of any fragment, just add t + d of all previous fragments.

With Smooth Streaming version 2.2, the tag repeat (r) has been introduced to avoid describing the duration of each fragment when it is the same for all. Here the timestamp of a given fragment is given by t + d × r.

To distinguish between a live and a VOD stream, Smooth Streaming uses the XML attribute isLive when playing live, nothing if it is VOD. With a VOD stream, the client has to download the manifest only once, it knows the URLs for all fragments. With a live stream, the client will have to request the manifest regularly to obtain the URLs of the new chunks.

However this manifest refresh is not mandatory since the player knows the duration of each fragment thanks to the meta-data. It can guess the URLs of the future fragments.

The chunks of a live stream can be kept during some time to allow catch-up TV. This keeping duration is given with the tag DVRWindowLength.

The XML tag ProtectionElement is used to give information about the DRM.

Scrambling

PlayReady

Scrambling technology from Microsoft, used mainly for scrambling SmoothStreaming content but also for HLS and DASH.

The scrambling process consists in affecting a keyID and key seed and then providing a license server url from which both parts: the origin server and the player will retreive the license.

(50)
(51)

Figure 61: PlayReady: SmoothStreaming scrambling

1. 1. The packager requests the key from the DRM server by providing the key id and the key seed 2. 2. The packager encrypts the content using the obtained key

3. 3. The client downloads the Manifest containing the scrambling parameters 4. 4. The client downloads the scrambling key from DRM server

(52)

HDS

HTTP Dynamic Streaming (HDS) is a protocol designed by Adobe for the Flash Player. Since it is getting less

and less used, we will only review it briefly.

HDS documents use the terms Fragments and Manifest.

Fragments

The file format is MPEG4-part12 with some additions. The files have the f4v file extension. H.264 is the only allowed video codec. Audio codecs are MP3 and AAC. The typical length of a fragment is 2 seconds.

Manifest

The Flash Media Manifest is an XML file with f4m extension Here is an example with 3 different bitrates.

<?xml version=”1.0” encoding=”utf-8”?> <manifest xmlns=”http://ns.adobe.com/f4m/1.0”> <id>myvideo</id> <duration>253</duration> <mimeType>video/x-flv</mimeType> <streamType>recorded</streamType> <baseURL>http://example.com”</baseURL> <drmMetadata url=”http://mydrmserver.com/mydrmmetadata”/> <bootstrapInfo profile=”named” url=”/mybootstrapinfo”/>

<media url=”/myvideo/low” bitrate=”408” width=”640” height=”480”/> <media url=”/myvideo/medium” bitrate=”908” width=”800” height=”600”/> <media url=”/myvideo/high” bitrate=”1708” width=”1920” height=”1080”/> </manifest>

(53)

already saw. Here is a list of features:

• various Periods. The set of available bitrates, languages, captions, subtitles ... does not change during a Period, but may change when changing Period. A good use would be a live TV channel streaming one show over one period, and another show over another period

• video and audio can be muxed together, or in separate tracks • 2 segment formats, see below

The wording for DASH is:

Media Presentation Description for playlist Segments for video chunks

Representation for the various available qualities

Adaptation Set for the list of interchangeable encoded versions of one media content components. An asset with

various video bitrates and various audio bitrates will have two Adaptation sets, one for video, one for audio.

Segments

DASH segments can be of two types: • MPEG2 Transport Stream

• ISO Base Media File Format (MPEG4-part12) using the MPEG4-part14 specificities

Media Presentation Description

The playlist in DASH is named Media Presentation Description (MPD). This is an XML file. In this XML we find the hierarchy MPD > Period > Adaptation set > Components > Representation.

Difference between live and VOD is done into the <MPD> XML element, by the attribute type. If dynamic, this means we are streaming live content and clients need to refresh the playlist. If static, this means a VOD content, where the playlist does not need a refresh.

However, this manifest refresh is not mandatory since the player knows the duration of each fragment thanks to the fragment meta-data. It can guess the URLs of the future fragments.

The segments URL are defined by either:

• a segmentList where the segments are listed one by one

• a segmentTemplate where you will find a formula to compute the URL of the segments

Example: In the MPD from Figure 62, the URL of each segment is constructed by concatening the beginning of the MPD URL (http://...), the base URL (ibc_video-), the Representation id (audio_eng=95000-video=256000), a dash (-), the segment number (starting from 1) and the .ts extension.

The URL of the first segment of the first video representation will be: http://server/URL/ibc_video-audio_eng=95000-video=256000-1.ts

(54)
(55)

example, many encoders can create HLS directly.

Smooth Streaming as a pivot format

Microsoft not only designed an OTT protocol, they also released the software (IIS), the file format for storage and the protocol for communication between the encoder and the origin server.

Smooth Streaming server file format

Smooth Streaming storage file format is PIFF/ISMV (see PIFF/ISMV). The file extensions are:

• ismv for files containing video and audio tracks • isma for files containing audio tracks only

The storage can be done in a multibitrate pre-fragmented unique file or in various pre-fragmented files for each bitrate. File is virtually split up into chunks when responding to a client request.

Server manifest

The server manifest is an XML file. Its file extension is ism for VOD and isml for live. It describes the relationship between media tracks, bitrates and files on disk.

<?xml version="1.0" encoding="utf-8"?>

<smil xmlns="http://www.w3.org/2001/SMIL20/Language"> <body>

<switch>

<audio src="./mezz.ismv" systemBitrate="96000" systemLanguage="eng"> <param name="trackID" value="1257" valueType="data" />

<param name="trackName" value="audio_eng" valueType="data" /> <param name="SamplingRate" value="24000" valueType="data" /> <param name="Channels" value="2" valueType="data" />

</audio>

<video src="./mezz.ismv" systemBitrate="800000"> <param name="trackID" value="2" valueType="data" /> <param name="MaxWidth" value="626" valueType="data" /> <param name="MaxHeight" value="352" valueType="data" /> </video>

<video src="./mezz.ismv" systemBitrate="1000000"> <param name="trackID" value="1" valueType="data" /> <param name="MaxWidth" value="852" valueType="data" /> <param name="MaxHeight" value="480" valueType="data" /> </video>

</switch> </body> </smil>

References

Related documents

Based on research conducted in the field, a factor that might cause menstrual hygiene behavior for seventh grade girls at SMPN 1 Sepulu is the lack of knowledge of young

In conclusion, the results of the present study showed that general attitude towards vaccines, knowledge about HPV infection, parents’ educational level and their

Furthermore, while symbolic execution systems often avoid reasoning precisely about symbolic memory accesses (e.g., access- ing a symbolic offset in an array), C OMMUTER ’s test

This author will inform educators about the types and temperaments of students in their classrooms, based on the Myers-Briggs Type Indicator (MBTI; Myers &amp; McCaulley, 1985),

• State Department of Transportation records for 255, 143 Milwaukee County women (ages 18-55 years) showed 78 percent with a current license and no recent suspensions or revocations,

'Su-Raj' is derived from contraction of two names viz. Suresh and Rajnikant, the founders of Su-Raj Group in the early Sixties. Su-Raj Diamonds is a leading diamond manufacturer

Perhaps, therefore it may be helpful when discussing dietary and physical activity behaviours with pregnant women that midwives assist them to view that their unborn baby’s needs

(2011) Hydrocarbon-degrading bacteria and the bacterial community response in Gulf of Mexico beach sands impacted by the Deepwater Horizon oil spill. (2015) Bacterial