Full Interactive Functions in MPEG-based
Video on Demand Systems
Kostas Psannis
Marios Hadjinicolaou
Dept of Electronic & Computer Engineering
Brunel University
Uxbridge, Middlesex, UB8 3PH,UK
Argy Krikelis
Aspex Technology Ltd
Abstract: - In this paper, we present an efficient method for supporting full interactive functions of MPEG-2
video stream. This method is based on storing multiple different encoded versions of the same movie at the server. A normal version is used for normal playback, while several interactive versions are used for Fast/Slow/Jump Play. Interactive versions are produced by encoding every N-th frame of the original uncompressed video using different GOP pattern of the normal version. The effectiveness of our approach in terms of reducing the allocated bandwidth of the interactive stream is evaluated through extensive simulations. Moreover the visual quality during the interactive mode is analyzed and the effect of re-qunatization error is examined.
Key-Words: - MPEG- Full Interactive Functions/Operations --VCR control operation.
1 Introduction
The maturing of video compression technologies, magnetic storage subsystems, and broadband networking has made Video On Demand (VOD) over computer networks more visible than ever. To improve the marketability of VOD services and accelerate their wide-scale deployment, these services must support user interactivity at affordable cost. Attempts to make television interactive are almost as old as television itself. Interactive television has different meanings for different people. Some people see the choice from many different channels already as interactivity. On the other hand some other would require the selections of a local video store to be available in real time, on-demand, through a broadband connection. It is generally believed that interactive program is a program or service in which the content itself, or the presentation manner of the content or even the presentation order of the content can be affected by the viewer. Near Interactive Video On Demand service support limited number of interactive functions, reducing the Quality of Services (QoS) of clients interactivity. To improve the marketability of VOD service, the client should interact with the content of the presentation deciding the viewing
schedule with the full range of interactive functions. This gives the desired multilevel QoS to the client. Full range of interactive functions include play/resume, stop, pause, Jump Forward (JF)/ Jump Backward (JB), Fast Forward (FF)/ Fast Rewind FR), Slow down (SD), and Slow Reverse(SR), Rewind. The difficulty of supporting interactivity varies form one interactive function to another. A stop or pause followed by resume are relatively easy to support since they don’t require more bandwidth than what is already allocated for normal playback. On the other hand, Fast Scanning {(FF), (FR), (JF), (JB)) involves displaying frames at multiple times the normal rate. Transporting and decoding frames at such rate is prohibitively expensive and is not feasible with today’s hardware decoders. In addition, the implementation of the full VCR functionality with the MPEG coded video in not a trivial task. MPEG video compression is based on motion compensated predictive coding. The MPEG structure [1] allows straightforward realization of the forward normal play, but imposes several constraints to other trick interactive mode. Several approaches have been proposed to support interactivity in a VOD system. Interactive functions can be supported by dropping parts of the original MPEG-2 video stream [2], [3]. Typically, dropping is performed after compression and aims to reduce the transport and decoding requirements without
This research was supported by British Council Corresponding Author: K E Psannis
causing significant degradation in video quality. Alternatively interactive functions can also be supported using separate copies of the movie that are encoded at lower quality of the normal playback copy [4], [5]. Other conventional schemes that support interactive functions either display frames at rate much higher than the normal playback, for example 90 fps [6] or involves downloading the video data in a player device (not real time playout- downloading can be done prior to viewing) located at the customer premises so that the customer can view without further intervention form the network [7]. Moreover interactive functions can be supported by transmitting additional data of the same movie at the server using additional storage at the customer premises [8], [9]. Rewind operations can be supported by transcoding the predicted coded at the client [10]. Note that none of the methods mentioned above support full interactive operations with minimum additional resources. In this work, we introduce an efficient approach for supporting full range of interactive functions in a Video on Demand (VoD) system. Our method is based on storing multiple different encoded versions of the same movie at the server. Interactive versions are produced by encoding every N-th frame of the original video using different GOP pattern of the normal version. The rest of the paper is organized as follows. . In Section 2 we describe the generation of the interactive bitstream. Section 3 presents the rate control schemes for the interactive bitstream. Section 4 analyzes the network bandwidth allocation for the normal playback and the interactive mode. Section 5 describes the storage overhead. In Section 6 the full range of interactive functions are presented. In section 7 the visual quality is analyzed through extensive simulations. Section 8 presents the effects of re-quantization error. Finally, conclusions and some points for open research are given in Section 9.
2 Preprocessing of Video Movies
To support interactive functions, the server maintains multiple, different encoded versions of each movie. One version, which is referred to as the normal version, is used for normal-speed playback. The other versions, which are referred to as interactive version, are used for fast play. Each interactive version is used to support Fast/Jump Forward/Backward Slow Down/Reverse and Reverse at a given speedup. The server switches between the various versions depending on the requested interactive function. Note that only one
version is transmitted at a given instant time. The corresponding interactive version is obtained by encoding every N-th (i.e., uncompressed) frame of the original movie a sequence of I- P(M)- frames
).
1
,
var
(
N
interactive=
iable
M
interactive=
Effectivelythis results in repeating the previous I-frame in the decoder, enhancing the visual quality during the interactive mode. It is necessary to perform the alternatively stream in GoP structure in an independent fashion. This GoP should perform the proper bit rate and frame rate in order to confirm with the goal that there is neither additional bit rate during the “interactive mode” nor an increased in the complexity of the decoder module respectively.
3 Rate Control Schemes
A common approach to control the size of an MPEG frame is to vary the quantization factor on a per-frame basis. This result in variable video quality during the Fast Forward/Rewind mode (however the quality is still constant during the normal version). The amount of quantization may be varied. This is the mechanism that provides constant quality rate control. The quantized coefficients
QF
( v
u
,
)
are computed from the DCT coefficients, thequantization scale MQUANT , and a
quantization_matrix W( vu, ), according to the following equation ) , ( ) , ( 16 ) , ( v u W MQUANT v u F v u QF × × =
(1)
The quantization step makes many of the values in the coefficient matrix zero, and it makes the rest smaller. The result is a significant reduction in the number of coded bits with no visual apparent differences between the decoded output and the original source data.
The quantization factor may be varied in two ways
• Varying the quantization scale
(MQUANT )
• Varying the quantization matrix (W( vu, )) To bound the size of
I
transcoded-frames of theinteractive mode, the encoder uses two predefined (higher-lower) thresholds. An I-frame is re-encoded such that its size is between
upper transcoded lower Threshold I Threshold ≤ ≤ (2) After an I-frame of a scan version has been re-encoded as an I-frame the encoding algorithm
checks whether the size of the compressed frame is between the two pre-defined thresholds. If it is not, then the quantization factor for the corresponding frame is increased/decreased and the frame is re-encoded.
Two different schemes can be used to initialise the quantization factor when an I-frame is to be re-encoded. In the first scheme, when an I-frame is to be re-encoded for the first time the encoder starts with a fixed quantization factor. The main problem with this method is that the quantization value might be kept unnecessarily high resulting in low quality during the interactive operations. Moreover the encoder in this scheme uses one predefined threshold (Itranscoded ≤Thresholdupper). In the second
scheme, the encoder tries to track the nominal quantization value, which was used in encoding the same type of frame in the normal version. In the first encoding attempt, the encoder starts with the nominal quantization value that was used to encode the same I-frame of the normal version. After the first encoding attempt, if the resulting frame size is between the two pre-defined thresholds (2), the encoder proceeds to the next frame. Otherwise, the quantization factor {quantization matrix (W( vu, )} varies and the same frame is re-encoded. The quantization matrix can be modified by maintaining the same value at the near-dc coefficients but with different slope towards the higher frequency coefficients. This procedure is repeated until the size of the compressed frame is between the two predefined thresholds (2). The advantage of this scheme is that it tries to produce interactive operations with the same constant quality of normal playback, but when it is not possible it minimizes the fluctuation in video quality during the interactive mode.
4 Network Bandwidth Allocation
Since the network bandwidth usually is the highest concern, the video during the normal version is coded with all I-P-B frames in order to achieve high compression ratios for the transport over the network with minimum bandwidth resources. In addition it is desirable to support interactive operations with minimum requirement in the network bandwidth.
4.1 Normal Playback
Assume that the MPEG-2 file has a playback rate of 30 frames per second. However, there are no hardware or software MPEG decoders that can run
faster than 30 frames per second. For our experiment we used an MPEG-2 encoder developed by the Software Simulation Group (SSG) [11]. We encoded 180 frames of the
football
clip. The Group of Pictures (GoPs) format was N=15 and M=3. Therefore we had a larger I- or P- frame, followed by two smaller B-frames. Figure 1 depicts the bit rate allocation for the football sequence.0 5000 10000 15000 20000 25000 30000 35000 0 50 100 150 200 Frames Number S iz e (bytes) I-frames P-frames B-frames
Figure 1: Bit rate allocation for the football sequence
According to Figure 1 the average bandwidth during the normal playback is given by
MBps byte
bits
Average (IPB)Size 8 1.98
30fps
× × =
4.2 Interactive mode
The total number of I-P(M) frames in the interactive mode can be computed as follows
ineractive e Interactiv M P I
I
N
TF
, ( )=
×
(3) where,• I is the number of I-frames (
N TF
I = )
•
(
N
interactive=
5
,
M
interactive=
1
)
are the new re-encoding parameters.The bit rate allocation during the interactive mode depends on the proposed encoding scheme. Figure 2 depicts the bit rate allocation for the proposed rate control schemes. 0 2000 4000 6000 8000 10000 12000 14000 16000 0 10 20 30 40 50 60 Frame Number S iz e (bytes) 1st-Scheme 2nd-Scheme
Figure 2: Frame sizes of interactive bitstream
1st Scheme • {w(u,v)=fixed
}
• average higher transcodedThreshold
I
I
≤
=
The average bandwidth is given by
MBps byte
bits
Average (IP) eractiveSize 8 0.53
30fps × int × = where, ) 1 M 1 ( x I Size ) (IP int int int int int average e interactiv eractive eractive eractive average eractive eractive N P N Average i i − + = 2nd Scheme • {w(u,v)=variable
}
• Iaverage≤Itranscoded ≤Iaverage10 9
The bandwidth is given by
MBps byte bits t Cons eractive 23 . 1 8 Size (IP) tan 30fps × int × =
5 Storage Overhead at the server
Generate separate copies for interactive operations come at the expense of extra storage at the server. Let TFbe the number of frames in the normal version. The storage requirements of the normal version and the interactive mode are given by
Size
IPB
Average
TF
W
server=
×
(
)
(4)(
IPM)
Size TF W Interactive M P I ge ExtraStora ( ) = , ( ) × (5) where,(
)
(
)
= algorithm -2nd , IP(M) Constant algorithm -1st (M)} Average{IP ) (M Size IPUsing (4) and (5) the following ratio can be derived.
(
IP M)
Size IPB Average N N W W eractive ge ExtraStora server ) ( ) ( int × = (6)Table 1 shows the increased ratio using equation (6) and typical characteristics of MPEG-2 compressed video (Figure 5) . 1st scheme 2nd scheme ge ExtraStora server W W 6.68 5.66
Table 1:Increased ratio for the two rate control schemes.
Note that for n-interactive versions the relative increase in the storage overhead at the server is given by normal eractive ge ExtraStora n i W N W ( int ) 1
∑
=6 Full range of Interactive Functions
To improve the marketability of VOD service, the client should interact with the content of the presentation deciding the viewing schedule with the full range of interactive functions.
• Fast Forward /Rewind is an operation in which the client browses the presentation in the forward /backward direction with normal sequences of pictures. This function is supported by abstracting all the I-frames in the forward /backward order of the normal stream
• Jump Forward /Backward is an operation in which the client jumps to a target time of the presentation in the forward /backward direction without normal sequence of pictures. Therefore the users jump directly to a particular video location. Jump Forward /Backward operation is supported by skipping forward / backward some I-frames of the normal stream.
•
Rewind operation can be supported by abstracting all the I-frames in the reverse order and generate P(M) frames as many as P- and B- frames in a Group of Pictures (GoP) of normal playback (N
interactive=
N
).• Slow Down (SD) is an operation in which the video sequence is presented forward with a lower playback rate. This function can be supported by abstracting all the I-frames in the forward order and generate P(M) frames as many as P- and B- frames in a Recording Ratio (RR) of normal playback (
N
interactive=
RR
).•
Slow Reverse is an operation in which the video sequence is presented backward with lower playback rate. This function can be supported by abstracting all the I-frames in the backward order and generate P(M) frames as many as P- and B- frames in a double Group of Pictures (GoP) of normal playback (N
interactive=
2
×
N
).7 Analyze the Visual Quality
We measure the visual quality of the interactive mode using the Peak Signal-to-Noise ratio (PSNR). The two approaches can be constrained with respect to video quality using the (PSNR). We use the PSNR of the Y-component of a decoded frame. The PSNR is obtained by comparing the original raw frame with its decoded version with encoding being done using the proposed schemes. Figure 3 depicts the PSNR values for motor race movie. Both schemes achieve acceptable visual quality because the PSNR is sufficiently large. The average PSNR value for the 60- frames is 46.03 dB for the first scheme and 47.06 dB for the second scheme, i.e., the average quality is slightly better under the second scheme.
Motor Race Trace
39 41 43 45 47 49 51 0 10 20 30 40 50 60 Frame Index P S NR (dB) 1st-Scheme 2nd-Scheme
Figure 3: PSNR for encoded frames in the
interactive mode8 Re-quantization error
If the server has only the forward bitstream (i.e., the original sequence is unavailable), we can decode the forward bitstream and re-encode (every N-th) the video sequence using different encoding structure (I- P(M)- frames). The I-frames (N-th frames) are transcoded (decode/re-encode), using the simplified transcoder (Figure 4). Its lower complexity is achieved by two factors.
• I-frames are encoded and decoded without reference to any another frame.
• P(M) frames are originated by implementing motion compensation between same frames, generating Marionette P(M)-frames (0 bits). Thus the output rate
R
outI and) ( M P out
R
are given by)]
(
[
1 2 n I outQ
DCT
x
R
=
(7))]
(
[
2 2 ) ( n M P outQ
DCT
e
R
=
(8)Figure 4: Proposed simplified transcoder
The decoded
x
1n pictures and the corresponding prediction errore
n2are given by)]
(
[
11 1 I in nIDCT
Q
R
x
=
− (9))
(
2 1 2 M n P n nx
M
x
e
=
−
− (10) Considering the ortho-normality of DCT following equations can be derived)]
(
[
11 2 I in I outQ
Q
R
R
=
− (11)(
(
)
)
)
(
[
11 2 2 ) ( M n P I in M P outQ
Q
R
DCT
M
x
R
=
−−
− (12) where,)
(
n2 M Px
M
− is the motion compensation function. Since the first quantization is performed independently of subsequent re-quantization, then the re-quantization error cannot be avoided [12]. The effect of the re-quantization error is depicted in Figure 5. The figure shows the difference PSNR of I- P (M)- frames between the following bit streams• Encoded-only {Encode every N-th frame of the original uncompressed video sequence as a sequence of I- P(M)- frames using the two rate control schemes}
• Transcoded {Transcode (encode/decode/re-encode) the intra coded frames as a sequence of I- P (M) frames using equations (11) and (12) and the two rate control schemes}
Re-quantization Error 1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 0 10 20 30 40 50 60 Frame Index Di ff e re nc e PSNR (d B ) 1st-scheme 2nd-scheme
Figure 5: PSNR difference due to re-quantization error for the two rate control schemes From Figure 5 it can be concluded that the re-quantization error leads to a drop in picture quality of about 1.4 dB in all I- P(M)- frames for the 1st rate control scheme and approximately 1.7dB for the 2nd rate control scheme.
9 Conclusions
In this paper, we presented an approach for supporting interactive operations in a Video On Demand System. Full range of interactive operations are supported by storing multiple different encoded versions of the same movie at the server. The corresponding interactive version is obtained by encoding every N-th (i.e., uncompressed) frame of the original movie a sequence of I- P(M)- frames. Effectively this results in repeating the previous I-frame in the decoder. Two rate control algorithms have been proposed to bound the size of the interactive stream. Simulation results show that the visual quality of the secondary bit stream is in acceptable quality for the two rate control algorithms. Future research will focus on applications of the proposed concept to MPEG-4 stream.
References:
[1] International Standard 13818-2.Information Technology- Genetic Coding of Moving Pictures And Associated Audio: Video
[2] Banu Ozden, Alexandros Biliris, Rajeen Rastogi, Avi Silberschatz. “A Low-Cost storage Server for Movie on Demand Databases”. In: Proc. of the 20th VLD Conference, Santiago, Chile, 1994.pp 594-605
[3] HJ Chen, A. Krishnamurthy, TDC Little, and D. Venkatesh, "A Scalable Video-on Demand Service for the Provision of VCR-Like
Functions," Proceedings IEEE Multimedia, 1995,
p. 65-71.
[4] Marwan Krunz, George Apostolopoulos. “Efficient Support for interactive scanning operations in MPEG-based video on video on demand”. Multimedia Systems, vol.8, no.1, Jan. 2000, pp.20-36
[5] Prashant J.Shenoy, Harrick M.Vin. “Efficient Support For Scan Operations In Video Servers”. In: Proc. ACM Multimedia Conference, November 1995. ACM Press, San Francisco, CA, pp 131-140
[6] Jayanta K Dey-Sircar, James D.Salehi, James F Kurose, Don Towsley. “Providing VCR
Capabilities in Large-Scale Video Servers” In: Proc. ACM Multimedia Conference, Oct 1994, San Francisco, CA, pp 25-32
[7] M-S.Chen, D.D. Kandlur. “Downloading and Stream Conversion: Supporting Interactive Playout of Video in a Client Station”. In: Proc IEEE Multimedia Conference (1995).
[8] Kostas Psannis, Marios Hadjinicolaou “Transmitting Additional Data of MPEG-2 Compressed Video to Support Interactive Operations”. In: Proc. IEEE ISIMP01, Hong Kong, May 2001 pp 308-311 [9] Kostas Psannis Marios Hadjinicolaou “ A New Methodology for Implementing Interactive Operations of MPEG-2 Compressed Video”, In Proc IEEE Second International Workshop on Digital and Computational Video, Tampa, USA, Feb2001, pp 21-28
[10] Kostas Psannis, Marios Hadjinicolaou “Support Backward Interactive Functions of MPEG–2 Stream”, In Proc: WSES Evolution Computation, Feb2002, Switzerland.
[11] MPEG Software Simulation Group, “MPEG-2 Encoder/ Decoder Version 1.2”
ftp://ftp.mpeg.org/pub/mpeg/mssg
[12] P. Assuncao and M. Ghanbari, "A frequency- domain video transcoder for dynamic bit-rate reduction of MPEG-2 bitstreams, " IEEE Trans. Circuits Syst. Video Technol., Dec. 1998