Enhancement layer rate control for high bitrate SNR scalable video coding
Jun Xie, Liang-Tien Chia
*Division of Computer Communication, School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Received 5 March 2003; accepted 8 July 2004 Available online 25 September 2004
Abstract
Signal-to-noise ratio (SNR) scalability has been incorporated into the MPEG-2 video-cod- ing standard to allow for the delivery of two services with the same spatial and temporal res- olution but different levels of quality. Scalable video coding has many advantages, such as its capability of coping with bandwidth variations, high flexibility and so on. However, few accu- rate rate control schemes for enhancement layer coding have been proposed. In this paper, we will present a novel enhancement layer rate control scheme for MPEG-2 SNR scalable video coding. First, we will address the current necessity and problem of rate control for layered coding. Then through analyzing characteristics of compressed data in the enhancement layer, we will derive our rate control model. The proposed rate control model is applied to a drift- free SNR scalable encoder and we show that it performs well for coding of the enhancement- layer bitstream.
2004 Elsevier Inc. All rights reserved.
Keywords: Rate control; SNR scalable coding; Bit allocation; MPEG-2; Video coding and transmission
1047-3203/$ - see front matter 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.jvcir.2004.07.001
*Corresponding author. Fax: +65 6792 6559.
E-mail address:[email protected](L.T. Chia).
www.elsevier.com/locate/jvci
The idea of scalable video coding is that a scalable video encoder should generate no less than two bitstreams, one of which is the base layer (BL) that carries the basic and most vital video information and can be decoded independently, and the others are enhancement layers (ELs) that add the residual information to enhance the qual- ity of the BL image and must be decoded together with the BL bitstream. Compared with non-scalable coding, scalable coding has its advantages. First, it can provide scalable and more flexible service. Thus with an effective strategy, scalable coding can perform better to satisfy different video quality requirement of different client (Wu et al., 2001). Second, scalable coding will be capable of dealing with the prob- lems during multimedia communication such as packet loss, congestion and so on (Gallant and Kossentini, 2001; Wang et al., 2001).
Rate control is widely used to trade off the presented visual quality and the com- pressed bitrate. Many rate control algorithms in the literature (MPEG-2 Video Test Model 5 and JTC1/SC29/WG11, 1993; Wang, 2000; Ding and Liu, 1996; Tao et al., 2000; Chiang and Zhang, 1997; Ribas-Corbera and Lei, 1999; Hang and Chen, 1997;
He and Mitra, 2002a; He et al., 2001) have been proposed for non-scalable coding, which are based on analyzing the characteristics of data source and encoding behav- iors. There is also a need to maximize the picture quality at different level of bitrate limit, when we deliver multiple layered bitstreams over channels with fluctuant qual- ity of service (QoS) or even without QoS guarantee. In SNR scalable coding, we can apply conventional rate control algorithms to the BL directly because of little differ- ence between BL and non-scalable compression. However, conventional rate control algorithms are not optimized for EL compression, because certain characteristics of EL coding behaviors are different from that of BL coding. For example, most ana- lytical rate models (Chiang and Zhang, 1997; Ribas-Corbera and Lei, 1999; Hang and Chen, 1997) are established by approximately formulating the probability distri- bution of DCT (referring to AC) coefficients as a Laplacian distribution; however, probability density functions (PDFs) of EL source are uncertain and related to the quantization stepsize of BL, in the extreme case, it will be uniform if quantization level of BL approaches infinity (Jayant and Noll, 1984). It is, therefore, difficult to model the PDF of EL data source in a tractable way and achieve an accurate rate model subsequently. Take another case of recent q-domain rate control (He and Mi- tra, 2002a; He et al., 2001), which is very much source-independent; the slope in the linear rate model, which is the ratio between the bitrate and non-zeros among quan- tized coefficients, is nearly a constant and stable in non-scalable coding, but it varies quite significantly for EL coding, thus such a rate model still requires to be improved by considering its accompanying characteristics during EL compression. In addition, it is observed that if we follow the traditional optimum bit allocation (OBA) solution
(Jayant and Noll, 1984), a substantial number of allocated values are negative. And rate control error will appear by simply setting those negative values to zeros. There- fore, it is not accurate enough to apply rate control algorithms for non-scalable cod- ing to EL directly. In previous work for scalable coding, ideas to optimize the quantization setting of EL were proposed and they are to perform rate-distortion (RD) optimization using Lagrangian multiplier function (Wilson and Ghanbari, 1999; Gallant et al., 1999) and to employ characteristics of the human visual system (HVS) (Lee et al., 1997). But none of them gave us a practical scheme to control the bitrate of different layers. In another study, rate control algorithms for SNR scalable coder proposed in Miloslavsky and Zakhor (1999) did not provide any analytic expression or a deep insight into the behaviors of EL compression. Therefore, the problem of rate control for scalable coding has not been investigated fully.
In this paper, our work will be centered on SNR scalable high-bitrate video cod- ing, and we will propose our rate control algorithm, which is based mainly on the following three points: first, the approximate linear relationship between the ultimate bitrate of EL and the percentage of the zero-valued coefficients after quantization;
second, a strong correlation between the bitrate and the ratio between the quantiza- tion parameter of BL, denoted as QBL, and that of EL, denoted as QEL; third, we give a practical solution to solve some problems of EL OBA. After applying our ap- proach to a MPEG-2 drift-free SNR scalable coder and comparing it with the clas- sical TM5 rate control algorithm (MPEG-2 Video Test Model 5 and JTC1/SC29/
WG11, 1993), we find the results of our proposed algorithm promising.
This paper is organized as follows: In Section 2, we describe two different struc- tures of the two-layered MPEG-2 SNR scalable coder. And then in Section 3, we analyze the characteristics of SNR scalable coding and describe some useful relation- ships employed in our scheme. Section 4 presents the RD optimization scheme for EL coding. In Section 5, we describe our rate control framework. Experiment results will be provided in Section 6. The respective experimental results are included in each section. And lastly, we will draw conclusions.
2. Two-layered MPEG-2 SNR scalable coder
SNR scalability is designed to allow for the delivery of two services with the same spatial and temporal resolution but different levels of quality.
The general structure of a SNR scalable coder is shown inFig. 1. The encoding process of BL is the same as that of a non-scalable encoder. If we view just the resul- tant BL, we will notice some quantization error and the EL is designed to encode such error in order to improve on the quality of the whole picture. In the event of I frame or P frames, both BL and EL compressed data will be inversely quantized to construct the store frame for later motion prediction and motion compensation.
This type of structure has a drawback due to the tight coupling between the two layer bitstreams (Ghanbari, 1999). If any information in EL bitstreams is lost and the BL bitstream is decoded by itself, decoded pictures of BL will suffer from picture drift, which causes picture quality degradation.
Naturally, drift free pictures are required and the solution can be achieved by loosening the tight coupling between the two layers. InGhanbari (1999)andArnold and Frater (2000), several drift-free encoder structures are proposed. We will adopt the structure shown inFig. 2, which cuts off the EL contributions on prediction, to study the behaviors of EL data compression.
3. Characteristics of enhancement layer compression
To achieve robust and accurate rate control schemes, there is a need to find an effective rate model based on the analysis of coding behaviors. In this section, we will analyze some important characteristics of EL coding and then describe some key relationships employed by our rate control algorithm.
Fig. 1. A two-layer SNR scalable encoder with drift at the base layer.
Fig. 2. A two-layer SNR scalable encoder without drift.
We first give the definition of some abbreviations for EL.
REL output bitrate,
RAC bit number for AC coefficients compression, C bit number for generalized syntax description,
PNZ non-zero percentage among quantized DCT coefficients, Qr ratio between QBLand QELas inEq. (1)
Qr¼QBL
QEL: ð1Þ
3.1. Relationship between RELand Qr
In BL coding, the quantizer step-size is specified for each DCT coefficient in a macroblock(MB) via a quantization matrix of 64 elements, denoted as quant_matrix.
The other important parameter is the quantization scale factor mquant. The quan- tizer step-size, used for quantizing the ith DCT coefficient in each of the blocks in a particular MB, is given by quant_matrixi· mquant. Thus, according to uniform quantization theory, amplitude of quantization error is distributed over the range
½quant matriximquant
2 ;quant matriximquant
2 . To express the quantization error over the whole BL picture coding, the range is rewritten as
QBL quant matrixi
2 ;QBL quant matrixi
2
: ð2Þ
It is the quantization noise of BL that EL is going to encode, so the source data for EL to quantize are distributed over the range(2), which is an important character- istic of EL coding. It shows us that there must be a strong relationship between QBL
and how to determine our expected QEL. We can see the probability distribution in Fig. 3 that reflects the AC coefficients of the BL and EL using the Stefan and Mobile&Calendar sequences. The AC coefficients in BL are usually modelled as a Laplacian distribution (Lam and Goodman, 2000). After BL quantization, the amplitude of the AC coefficients in EL will be within the range of(2). It is observed that in most cases, PDF of the AC coefficients in EL still has a similar shape to that of BL, which is also affected by QBL.
In general, analytic rate control approaches for non-scalable coding are based on analyzing the PDF of DCT coefficients and quantization with a uniform scalar quan- tizer of step size Q, the difference frame rate is estimated as R(Q) = k Æ H(Q), where H(Q) is the empirical entropy of the Q-quantized coefficients and k is a empirical constant, caused by multiplicative factors including the mismatch of different entro- py computation based on the ideal and practical DCT coefficients distribution and other entropy coding methods after quantization such as run-length coding (RLC) and variable-length coding (VLC) (Hang and Chen, 1997). According to general high-bitrate coding theory and uniform quantization theory, we use the approxima- tion(3)inRibas-Corbera and Neuhoff (1996)for computing different frame entropy of EL, denoted as H(Q) or R(Q), where r2 is the variance of the difference frame pixels.
And we use(4)inGersho and Gray (1992)to measure distortion (D) using mean square error (MSE) criterion
RðQÞ ¼1
2log22e2r2
Q2; ð3Þ
D¼Q2
12: ð4Þ
For EL coding, the distortion within the BL is the source for EL to encode. There- fore, the variance of the EL difference frame pixels, denoted as r2EL, is the distortion of BL measured by MSE. Thus,(4) can be rewritten as(5).
r2EL¼Q2BL
12 : ð5Þ
According to traditional information theory, the RD function is expressed by (6), where r2 is the variance of the data source and the factor 2 is dependent
Fig. 3. Probability distribution of AC (0,1) coefficients: (A) and (C) the BL and EL in Mobile & Calendar, respectively; (B) and (D) the BL and EL in Stefan, respectively.
on the PDF of data source as well as the type of encoding used (Jayant and Noll, 1984)
DðRÞ ¼ 2r222R: ð6Þ
Applying the expressions(4)–(6)to EL coding, we can get a group of formulas(7).
Thus, we can get the approximate expression for the EL frame entropy(8)from(7), as shown inFig. 4
DELðRELÞ ¼ 2r2EL22REL; r2EL¼Q122BL;
DEL¼Q122EL; 8>
><
>>
:
ð7Þ
RELðQELÞ ¼ log2QBL
QELþ log2: ð8Þ
The distortion of BL can be reduced only when QEL< QBLand QELis greater than 1. Therefore, Qr is in the range [1,QBL]. Moreover, in practice Qr varies in a small range because either too large or too small quantization parameters are seldom used.
Thus, formula(8)can be simplified to:
RELðQELÞ / ðQr 1Þ: ð9Þ
In the experiments, we use the test sequences, Mobile & Calendar, Boating, and Stefan, to verify the relationship shown in formula(9). Set a fixed mquant of BL equal to 40. We encode the first frame of the above sequences with different Qr. The relationship between actual frame entropy of EL and Qr is shown in Fig. 5.
And the correlation coefficient, expressed as(10), is used to estimate the relationship, where Cov(x,y) is the covariance and Var(x) and Var(y) is the variance. For different pictures, the correlation coefficients are 0.9998, 0.9981 and 0.9994, respectively,
Fig. 4. Plot of RELversus Qr.
which implies a linear relationship exists between them. Then the aim of our control algorithms for EL turns from traditionally looking for a suitable QELto looking for a suitable Qr to get our required QEL
qxy ¼ Covðx; yÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðxÞ
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðxÞ
p : ð10Þ
3.2. Relationship between RELand PNZ
The q-domain rate control algorithms for non-scalable coder have been pro- posed according to the approximate linear relationship between the ultimate bi- trate and the percentage of zero-value quantized coefficients (He and Mitra, 2002a; He et al., 2001). We find that this linear relationship also exists in EL coding.
REL, as expressed in(11), is composed of AC component, denoted as RAC, and generalized header and syntax bits for the frame, denoted as C which is a relatively constant number. Note that all the coefficients in EL are coded by non-intra VLC table and there is no DC component because it is of a differentiated nature (Informa- tion Technology and ISO/IEC, 1995).
REL¼ RACþ C: ð11Þ
An example of the relationship between RACand PNZis plotted inFig. 6, where BL is encoded with a fixed mquant equal to 50. We useEq. (12)to express the approx- imate linear relationship, where KR is the slope and its characteristics will be dis- cussed in the following paragraph. On closer observation ofFig. 6we can see that the approximate linear relationship do not hold for PNZwith small values. In most EL coding scenarios, PNZtakes a small value at frame level, e.g. 0.05 or so, and will also vary significantly in different MBs. This motivates us to investigate the RACand PNZrelationship versus different PNZ. InFig. 7, we plot the correlation coefficients at different PNZ. It is noticed that correlation coefficient drops significantly for lower values of PNZ, thus indicating that the linear rate model needs to be improved fur- ther for EL coding
RAC¼ KR PNZ: ð12Þ
Fig. 5. Relationship between RACand Qr.
For non-scalable coding, variation of KRis small (He and Mitra, 2002a; He et al., 2001) whereas for EL coding, it varies as shown in Fig. 8, where plots in the top row are the relationship between KR and PNZ and plots in the bottom row show the relationship between KR and RAC when BL is encoded with a fixed mquant equal to 50. It can be seen from Fig. 8 that KR is very much a constant only in the range of larger PNZ, which corresponds to a high bitrate and beyond the bi- trate used in a typical video scenario. KR also exhibits a steep slope for smaller PNZrange (corresponding to lower bitrate range), which is an obstacle to realizing
Fig. 6. Plot of RACversus PNZ.
Fig. 7. Plot of correlation coefficients versus PNZ.
a simple rate control strategy. From (12), we can understand that KR is simply a statistical parameter and its value indicates the average bit number used for each non-zero coefficient; because in a hybrid DCT and entropy coder, bits are mainly assigned for coding non-zero coefficients. When non-zero coefficients occur with a very small probability, it does not match the optimal cases that entropy techniques are designed to handle, and KR will show a steep increase in the range of lowest PNZ as shown Fig. 8. In EL coding, PNZ usually has a small probability in the frame level, and at the same time, KR and PNZ varies greatly in different MBs.
It is not easy to establish an accurate and simple expression to describe the rela- tionship between them. Therefore, we will treat it as different constant in different classes by a MB-classification strategy which will be described in Section 4.
3.3. Relationship between Qrand PNZ
Another important observation is the strong correlation between Qr and PNZ. Using the same test sequence and encoding strategy as above,Fig. 9depicts the rela- tionship of three sets of gathered data.
The correlation coefficients between Qrand PNZfor sampling pictures of different sequence are 0.9975, 0.9984 and 0.9968, respectively. Therefore, an approximate lin- ear expression(13)can be used to model such a relationship, where KQis the slope.
We can also see the linear relationship fromFig. 9
Qr¼ 1 þ KQ PNZ: ð13Þ
Fig. 8. Plots of KRversus PNZand KRversus RAC.
4. Optimum bit allocation scheme for EL coding 4.1. Theoretical model for EL optimum bit allocation
In this section, we will look at the distortion model for EL coding and propose a practical OBA scheme for EL coding at MB-level.
4.1.1. Distortion model for EL coding
Without accurate distortion and rate model, the OBA cannot be carried out (Ding and Liu, 1996). Naturally, if we want to achieve better coding performance by OBA, an accurate RD model for EL coding is needed.
InHe and Mitra (2002b), a related q-domain distortion model has been developed and that motivates us to study the relationship between distortion (measured by MSE) and PNZ, and to see whether it is suitable for EL coding. Let D0¼ DEL=r2EL¼:
DEL=DBL be the normalized distortion, where r2EL is the variance of EL source and is also equal to the distortion of BL, DBL; and DELis the distortion of EL. We plot D0versus PNZwhen BL is encoded in a fixed scheme and EL is en- coded with different quantizer, as shown inFig. 10. It can be observed that we can still use a exponential function to describe the relationship between DELand PNZ, therefore, we use the distortion model proposed inHe and Mitra (2002b)to denote the EL distortion, expressed in(14), where a is a statistical parameter. Note that a takes a larger value compared with its value in non-scalable coding and PNZis usu- ally small, e.g. about 0.05 in the frame level of EL coding
DELðPNZÞ ¼ r2ELeaPNZ: ð14Þ
4.1.2. Theoretical optimum bit allocation for EL coding
OBA is performed to assign detailed number of bit to each data source in order to minimize the overall distortion and achieve the best quality. This is usually solved by Lagrange theory, formulated as
F ¼ minðD þ kRÞ: ð15Þ
Fig. 9. Relationship between Qrand PNZ.
With rate(12)and distortion model(14)for EL coding, we can derive the theoret- ical OBA scheme (He and Mitra, 2002b). Let {Si|1 6 i 6 L} be the input source and RT be the target bit number. The problem can be addressed as (16) for each data source, where Ni is the size of source Si; and the optimal solution for Si is given in (17)
DELi¼ r2ELi eaiPNZi Ni; RELi ¼ KRi PNZi Ni; F ¼ min
PNZi
PL
i¼1
r2ELi eaiPNZi Niþ k PL
i¼1
KRiPNZiNi RT
; 8>
>>
><
>>
>>
:
ð16Þ
RELi ¼
niNi RT PsLj¼1njNjlnr
2 ELj nj
PL
j¼1njNj þ niNilnr2ELi
ni ni¼KRi ai
: ð17Þ
4.2. Challenges for practical optimum bit allocation for EL coding
The above result is only a theoretical solution, when applying it to EL coding;
there are some constraints and challenges to overcome.
Fig. 10. Distortion curves of each frame in the sequences.
4.2.1. Negative allocated bit number
First of all, one constrain is neglected: RiP0. In practice, it is meaningless to as- sign negative bit number to the data source. However, the error cannot be ignored if we simply modify the negative solutions of(17)to zeros. For instance, in MB-level OBA, quite a lot of MBs are assigned negative bit number resulting from the larger variance ratio among MBs in EL coding. We can see the variance distribution and corresponding theoretically assigned bit number at the MB level by the example shown inFig. 11, one frame of Mobile & Calendar sequence when BL and EL both are encoded at a target bitrate 3 Mbit/s. If the allocated negative bit number is sim- ply set to zero, there will be a larger mismatch between actual allocated bit number and target bit number.
4.2.2. Parameters KRand a
Second, when we perform OBA according to the formula(19), there is one uncer- tain key factor, namely ni¼ KRi=ai. The property of KRhas been discussed at the end of Section 3.2 and is shown inFig. 8. As seen inFig. 12, different value of KRin the frame level exists when encoding a Mobile & Calendar frame at different bitrate.
Thus, for EL coding both KRand a are not nearly constant and, therefore, difficult to formulate, which is a barrier to perform bit allocation and decide the EL quanti- zation parameter. Even if we use some mathematical tool to approximately estimate them, the high complexity and latency threat to the stability of our rate control
Fig. 11. Example of the MB variance distribution and the corresponding assigned bit number after MB- level OBA.
algorithm will be introduced. In the following paragraph, we will present our schemes to deal with such challenges.
4.3. Practical optimum bit allocation schemes for EL coding in MB-level 4.3.1. Re-optimization strategy
Note that there is one distinct advantage of EL coding that MBs can be skipped arbitrarily and frequently, if necessary, because of its enhancement nature. Then, when performing practical bit allocation, we first skip these MBs with negative allo- cated solutions. FromFig. 11, it is obvious that quite a few MBs are allocated a neg- ative number of bits and there will be large bit number mismatch when we simply set negative values to zeros. To decrease such a mismatch, we perform re-optimization by reassigning the target number of bit allocated for the frame among those MBs allocated positive bit number after the first optimization. Then the distribution of allocated number of bit will be smooth; in the following step, we can continue to skip MBs with negative solutions after the second optimization; and at the same time we skip the MBs with smallest variance among those with positive allocated bit number to make up for the mismatch.
4.3.2. MB classification
According to our observation, both KR and a varies for different coding source and strategy. We classify the MBs by the measure of variance, denoted as fr2ij1 6 i 6 ng and the mean of variance, donated as r2¼ ð1=LÞPL
i¼1r2i. For lower computational complexity and implementation cost, all the MBs are classified into four classes, and the classification method is shown as follows:
C1 :fMBijr2i 6r2
2g;
C2 :fMBijr22<r2i 6 r2g;
C3 :fMBij r2<r2i 62 r2g;
C4 :fMBijr2i >2 r2g:
8>
>>
><
>>
>>
:
ð18Þ
Note that OBA is mainly decided by the MB variance, thus MBs of class C1 are usu- ally allocated negative number of bits and skipped according to expression (17).
Fig. 12. Plot of a versus bitrate using Mobile & Calendar sequence.
After classification, each class can be looked on as an independent input source, in which all MBs can be assumed to have similar mathematical property and coding behaviors, and MBs belonging to the same class can be treated as having the same KRand a in that class level, which can be estimated by the following rewritten expres- sions of(12) and (14)
KRi ¼RACi
PNZi; ð19Þ
ai¼ 1 PNZi
lnr2ELi DELi
: ð20Þ
5. Rate control
In this section, we will describe our rate control scheme that can be used for a two- layered MPEG-2 SNR encoder.
5.1. Rate model
According to the above analysis, we first derive our rate model. First, combining Eqs. (1) and (12), we can deriveEq. (21), which can be substituted intoEq. (12)to derive our rate control model(22), where QBLP QEL
PNZi ¼Qr 1
KQ ; ð21Þ
REL¼ C þ KRQBL=QEL 1
KQ : ð22Þ
5.2. Frame level rate control
The frame level assigns a target number of bits for the current frame to encode.
We simply use(23)for selecting the target number of bit for the frame, where R de- notes the bitrate, F is the frame rate, and Bf(24)is a buffer feedback factor (ISO/IEC JTC1/SC29/WG11,N3908, 2001), B is the current buffer level and Bs is the buffer size, set as 4 Æ R/F
T ¼ max R F Bf; R
4F
; ð23Þ
Bf ¼Bþ 2 ðBs BÞ
2Bþ ðBs BÞ : ð24Þ
Step1.1 Computation of parameters
First, compute MB variance, r2i; second, classify MBs into different classes, Class j, by(18); third, compute the parameters KRj, ajand KQjfor each Class j according to the former frame, and for the first frame, default values are chose; fourth, let KRi, ai
and KQifor different MBs be equal to KRj, ajand KQjof the class they belong to. In practice, MBs in Class1 and part of Class2 are usually skipped after OBA, and some parameters may not be possible to estimate and they are set to the default constant accordingly.
Step 1.2 Optimum bit allocation
With r2i, KR and ai, we can perform OBA by(17) and our re-optimization scheme (Section 4.3), to compute the target bit number for each MB.
Fig. 13. Flowchart of rate control at MB-level.
[Step 2] Compute QELfor ith Macroblock
First, check the allocated number of bit RACifor ith MB, if RACi<0, then skip the ith MB, else compute PNZiby(25).
PNZi ¼RACi
KRi : ð25Þ
Second, compute Qrby(13), so we can obtain the desired QELiand then encode the MB.
[Step 3] Update model parameters for each class
After encoding the ith MB belonging to Class j, compute the KRjby(19)and KQjby (13)for Class j according to the former encoded MBs belonging to Class j ; then update the KRjand KQj.
6. Experimental results
In this section, we will apply our approach on a SNR scalable encoder, which is based on the non-scalable MPEG-2 coder,1to verify the performance of our rate control algorithm. We test our rate control algorithm to control the EL bitrate with the BL encoded either in VBR mode or in CBR mode (TM5 rate control). We also compare our proposed algorithm with TM5 rate control when applying them to EL compression.
MPEG-2 SNR scalable encoder structure we employ is the same as that shown in Fig. 2. The raw sequences used in the test are as follows: (a) Boating (720· 576 4:2:0), 100 frames, (b) Mobile & Calendar (720· 576 4:2:0), 100 frames, shown in Fig. 14. In our experiments, the length of GOP (group of pictures) is 15, and the dis- tance between I frame and P frame is 3.
Tables 1 and 2show you the PSNR performance comparison between our pro- posed algorithm and TM5 rate control. In Table 1, the BL is encoded in VBR mode and at the same time EL is encoded at different EL bitrate, respectively.Ta- ble 2 shows the case that the BL is encoded in CBR mode (employing TM5 rate control) and at the same time EL rate control is performed by our proposed algo- rithm and TM5 rate control, respectively. They both show that our rate control algorithm can achieve a higher average PSNR gain compared with TM5 rate con- trol model. Examples of PSNR comparison for each frame are plotted inFigs. 15 and 16.
In the following, we show you the comparison of MSE at the MB-level. As show inFig. 17, we take one frame in the Mobile & Calendar sequence and plot the MSE comparison among EL source frame, reconstructed frame after TM5 rate control and reconstructed frame after our proposed algorithm, where the order of MB is
1 Seehttp://www.mpeg.org/mpeg/mssg/#source
sorted by MSE of MB in an ascending order. It can be observed that the gradient of our proposed algorithm is gradual, this implies that the picture will appear to be smoother and quality will be constant compared to the one controlled by TM5.
Fig. 14. Raw sequences.
Table 2
Comparison when the BL is encoded in CBR model Video sequences BL bitrate (bit/s)
(CBR)
BL PSNR EL bitrate (bit/s) BL&EL average PSNR TM5 Proposed Gain
2M 29.502 30.339 0.837
Boating 3M 27.975 3M 30.130 31.371 1.241
(mquant =50) 4M 30.654 32.265 1.611
2M 28.879 29.733 0.854
Mobile & Calendar 3M 28.314 3M 29.224 30.459 1.235
(mquant =50) 4M 29.622 31.013 1.391
Table 1
Comparison when the BL is encoded in VBR model Video sequences BL bitrate
(bit/s) (VBR)
BL PSNR EL bitrate (bit/s) BL and EL average PSNR TM5 Proposed Gain
2M 30.968 31.488 0.520
Boating 3.9M 29.974 3M 31.474 32.332 0.858
(mquant = 50) 4M 31.968 33.095 1.127
2M 27.755 27.979 0.224
Mobile & Calendar 4.3M 27.097 3M 28.070 28.485 0.415
(mquant = 50) 4M 28.427 29.022 0.595
Fig. 15. PSNR performance comparison between TM5 and our proposed algorithm for EL coding when BL is encoded in the CBR mode and when EL is encoded at 3 Mbit/s.
Fig. 16. PSNR performance comparison between TM5 and our proposed algorithm for EL coding when BL and EL are encoded at 3 Mbit/s, respectively.
Fig. 17. Comparison of the MSE of the reconstructed frame in MB-level.
the bitrate and Qr. Second, we find that the conventional linear rate model, proposed for non-scalable coding, needs to be improved. Specifically, there is significant var- iation of the gradient for EL coding and the proposed scheme in the paper have dealt with such a problem. Third, because the rate control parameters are not stable for robust video compression, we propose a MB classification method to perform EL OBA. Fourth, we propose a re-optimization scheme for EL OBA to decrease the rate control error. In addition, other related useful relationships are integrated to achieve our EL rate control framework. Experiments results prove that our rate control algorithm performs well for SNR EL coding and shows an improvement when com- pared with the classical TM-5 rate control.
References
Arnold, J.F., Frater, M.R., Wang, Y., 2000. Efficient drift-free signal-to-noise ratio scalability. IEEE Trans. Circuits Syst. Video Technol. 10, 70–82.
Chiang, T., Zhang, Y.-Q., 1997. A new rate control scheme using quadratic rate distortion model. IEEE Trans. Circuits Syst. Video Technol. 7, 246–250.
Ding, W., Liu, B., 1996. Rate control of MPEG video coding and recording by rate-quantization modeling. IEEE Trans. Circuits Syst. Video Technol. 6, 12–20.
Gallant, M., Kossentini, F., 2001. Rate-distortion optimized layered coding with unequal error protection for robust internet video. IEEE Trans. Circuits Syst. Video Technol. 11, 357–372.
Gallant, M., Kossentini, F., 1999. Efficient scalable DCT-based video coding. In: Proceedings of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering, vol. 2, Shaw Conference Center, Edmonton, Alberta, Canada, pp. 9–12.
Gersho, A., Gray, R.M., 1992. Vector Quantization and Signal Compression. Kluwer Academic Publishers, Morwell, MA.
Ghanbari, M., 1999. Video Coding an Introduction to Standard Codecs. The Institution of Electrical Engineers, London, UK.
Hang, H.-M., Chen, J.-J., 1997. Source model for transform video coder and its application. I.
Fundamental theory. IEEE Trans. Circuits Syst. Video Technol. 7, 287–298.
He, Z., Mitra, S.K., 2002a. A linear source model and a unified rate control algorithm for DCT video coding. IEEE Trans. Circuits Syst. Video Technol. 11, 970–982.
He, Z., Mitra, S.K., 2002b. Optimum bit allocation and accurate rate control for video coding via-domain source modeling. IEEE Trans. Circuits Syst. Video Technol. 12, 840–849.
He, Z., Kim, Y.K., Mitra, S.K., 2001. Low-delay rate control for DCT video coding via-domain source modeling. IEEE Trans. Circuits Syst. Video Technol. 11, 928–940.
Information Technology, ISO/IEC, 1995. Generic coding of moving pictures and associated audio information: Video.
ISO/IEC JTC1/SC29/WG11, N3908, 2001. MPEG-4 Video Verification Model 18.0, January.
Jayant, N.S., Noll, P., 1984. Digital Coding of Waveforms Priciples and Applications to Speech and Video. Prentice-Hall, Englewood Cliffs, NJ.
Lam, E.Y., Goodman, J.W., 2000. A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Processing 9, 1661–1665.
Lee, B., Park, K., Hwang, J., 1997. H.263-based SNR scalable video codec. IEEE Trans. Consumer Electron. 43, 614–622.
Miloslavsky, E., Zakhor, A., 1999. Rate control for layered video compression using matching pursuits.
In: Proceedings of Image Processing, ICIP 99, vol. 2, 1999, pp. 357–361.
MPEG-2 Video Test Model 5, ISO/IEC JTC1/SC29/WG11, 1993. MPEG93/457, April.
Ribas-Corbera, J., Lei, S., 1999. Rate control in DCT video coding for low-delay communications. IEEE Trans. Circuits Syst. Video Technol. 9, 172–185.
Ribas-Corbera, J., Neuhoff, D., 1996. On the optimal motion vector accuracy for block-based motion- compensated video codersProceedings of the IS&T/SPIE Dig. VideoComp.. Alg. & Tech., San Jose, CA, pp. 302–314.
Tao, B., Dickinson, B.W., Peterson, H.A., 2000. Adaptive model-driven bit allocation for MPEG video coding. IEEE Trans. Circuits Syst. Video Technol. 10, 147–157.
Wang, L., 2000. Rate control for MPEG video coding. Visual Communication and Image Processing, SPIE, vol. 2501.
Wang, Y., Ostermann, J., Zhang, Y.-Q., 2001. Digital Video Processing and Communication. Prentice- Hall, Englewood Cliffs, NJ, Ch. error control in video communication.
Wilson, D., Ghanbari, M., 1999. Optimization of MPEG-2 SNR scalable codecs. IEEE Trans. Image Processing 8, 1435–1438.
Wu, D., Hou, Y.T., Zhu, W., Zhang, Y.-Q., Peha, J.M., 2001. Streaming video over the Internet:
approaches and directions. IEEE Trans. Circuits Syst. Video Technol. 11, 282–300.