Enhancement layer rate control for high bitrate SNR scalable video coding

(1)

Enhancement layer rate control for high bitrate SNR scalable video coding

Jun Xie, Liang-Tien Chia

^*

Division of Computer Communication, School of Computer Engineering, Nanyang Technological University, 639798, Singapore

Received 5 March 2003; accepted 8 July 2004 Available online 25 September 2004

Abstract

Signal-to-noise ratio (SNR) scalability has been incorporated into the MPEG-2 video-coding standard to allow for the delivery of two services with the same spatial and temporal resolution but diﬀerent levels of quality. Scalable video coding has many advantages, such as its capability of coping with bandwidth variations, high ﬂexibility and so on. However, few accurate rate control schemes for enhancement layer coding have been proposed. In this paper, we will present a novel enhancement layer rate control scheme for MPEG-2 SNR scalable video coding. First, we will address the current necessity and problem of rate control for layered coding. Then through analyzing characteristics of compressed data in the enhancement layer, we will derive our rate control model. The proposed rate control model is applied to a drift- free SNR scalable encoder and we show that it performs well for coding of the enhancement- layer bitstream.

Keywords: Rate control; SNR scalable coding; Bit allocation; MPEG-2; Video coding and transmission

doi:10.1016/j.jvcir.2004.07.001

*Corresponding author. Fax: +65 6792 6559.

E-mail address:[email protected](L.T. Chia).

www.elsevier.com/locate/jvci

(2)

The idea of scalable video coding is that a scalable video encoder should generate no less than two bitstreams, one of which is the base layer (BL) that carries the basic and most vital video information and can be decoded independently, and the others are enhancement layers (ELs) that add the residual information to enhance the quality of the BL image and must be decoded together with the BL bitstream. Compared with non-scalable coding, scalable coding has its advantages. First, it can provide scalable and more flexible service. Thus with an effective strategy, scalable coding can perform better to satisfy different video quality requirement of different client (Wu et al., 2001). Second, scalable coding will be capable of dealing with the problems during multimedia communication such as packet loss, congestion and so on (Gallant and Kossentini, 2001; Wang et al., 2001).

Rate control is widely used to trade oﬀ the presented visual quality and the compressed bitrate. Many rate control algorithms in the literature (MPEG-2 Video Test Model 5 and JTC1/SC29/WG11, 1993; Wang, 2000; Ding and Liu, 1996; Tao et al., 2000; Chiang and Zhang, 1997; Ribas-Corbera and Lei, 1999; Hang and Chen, 1997;

He and Mitra, 2002a; He et al., 2001) have been proposed for non-scalable coding, which are based on analyzing the characteristics of data source and encoding behaviors. There is also a need to maximize the picture quality at different level of bitrate limit, when we deliver multiple layered bitstreams over channels with fluctuant quality of service (QoS) or even without QoS guarantee. In SNR scalable coding, we can apply conventional rate control algorithms to the BL directly because of little difference between BL and non-scalable compression. However, conventional rate control algorithms are not optimized for EL compression, because certain characteristics of EL coding behaviors are different from that of BL coding. For example, most ana- lytical rate models (Chiang and Zhang, 1997; Ribas-Corbera and Lei, 1999; Hang and Chen, 1997) are established by approximately formulating the probability distribution of DCT (referring to AC) coefficients as a Laplacian distribution; however, probability density functions (PDFs) of EL source are uncertain and related to the quantization stepsize of BL, in the extreme case, it will be uniform if quantization level of BL approaches infinity (Jayant and Noll, 1984). It is, therefore, difficult to model the PDF of EL data source in a tractable way and achieve an accurate rate model subsequently. Take another case of recent q-domain rate control (He and Mi- tra, 2002a; He et al., 2001), which is very much source-independent; the slope in the linear rate model, which is the ratio between the bitrate and non-zeros among quantized coefficients, is nearly a constant and stable in non-scalable coding, but it varies quite significantly for EL coding, thus such a rate model still requires to be improved by considering its accompanying characteristics during EL compression. In addition, it is observed that if we follow the traditional optimum bit allocation (OBA) solution

(3)

(Jayant and Noll, 1984), a substantial number of allocated values are negative. And rate control error will appear by simply setting those negative values to zeros. There- fore, it is not accurate enough to apply rate control algorithms for non-scalable coding to EL directly. In previous work for scalable coding, ideas to optimize the quantization setting of EL were proposed and they are to perform rate-distortion (RD) optimization using Lagrangian multiplier function (Wilson and Ghanbari, 1999; Gallant et al., 1999) and to employ characteristics of the human visual system (HVS) (Lee et al., 1997). But none of them gave us a practical scheme to control the bitrate of diﬀerent layers. In another study, rate control algorithms for SNR scalable coder proposed in Miloslavsky and Zakhor (1999) did not provide any analytic expression or a deep insight into the behaviors of EL compression. Therefore, the problem of rate control for scalable coding has not been investigated fully.

In this paper, our work will be centered on SNR scalable high-bitrate video coding, and we will propose our rate control algorithm, which is based mainly on the following three points: ﬁrst, the approximate linear relationship between the ultimate bitrate of EL and the percentage of the zero-valued coeﬃcients after quantization;

second, a strong correlation between the bitrate and the ratio between the quantization parameter of BL, denoted as QBL, and that of EL, denoted as QEL; third, we give a practical solution to solve some problems of EL OBA. After applying our approach to a MPEG-2 drift-free SNR scalable coder and comparing it with the classical TM5 rate control algorithm (MPEG-2 Video Test Model 5 and JTC1/SC29/

WG11, 1993), we ﬁnd the results of our proposed algorithm promising.

This paper is organized as follows: In Section 2, we describe two diﬀerent structures of the two-layered MPEG-2 SNR scalable coder. And then in Section 3, we analyze the characteristics of SNR scalable coding and describe some useful relationships employed in our scheme. Section 4 presents the RD optimization scheme for EL coding. In Section 5, we describe our rate control framework. Experiment results will be provided in Section 6. The respective experimental results are included in each section. And lastly, we will draw conclusions.

2. Two-layered MPEG-2 SNR scalable coder

SNR scalability is designed to allow for the delivery of two services with the same spatial and temporal resolution but diﬀerent levels of quality.

The general structure of a SNR scalable coder is shown inFig. 1. The encoding process of BL is the same as that of a non-scalable encoder. If we view just the resul- tant BL, we will notice some quantization error and the EL is designed to encode such error in order to improve on the quality of the whole picture. In the event of I frame or P frames, both BL and EL compressed data will be inversely quantized to construct the store frame for later motion prediction and motion compensation.

This type of structure has a drawback due to the tight coupling between the two layer bitstreams (Ghanbari, 1999). If any information in EL bitstreams is lost and the BL bitstream is decoded by itself, decoded pictures of BL will suﬀer from picture drift, which causes picture quality degradation.

(4)

Naturally, drift free pictures are required and the solution can be achieved by loosening the tight coupling between the two layers. InGhanbari (1999)andArnold and Frater (2000), several drift-free encoder structures are proposed. We will adopt the structure shown inFig. 2, which cuts oﬀ the EL contributions on prediction, to study the behaviors of EL data compression.

3. Characteristics of enhancement layer compression

To achieve robust and accurate rate control schemes, there is a need to ﬁnd an eﬀective rate model based on the analysis of coding behaviors. In this section, we will analyze some important characteristics of EL coding and then describe some key relationships employed by our rate control algorithm.

Fig. 1. A two-layer SNR scalable encoder with drift at the base layer.

Fig. 2. A two-layer SNR scalable encoder without drift.

(5)

We ﬁrst give the deﬁnition of some abbreviations for EL.

REL output bitrate,

RAC bit number for AC coeﬃcients compression, C bit number for generalized syntax description,

PNZ non-zero percentage among quantized DCT coeﬃcients, Qr ratio between QBLand QELas inEq. (1)

Q_r¼Q_BL

Q_EL: ð1Þ

3.1. Relationship between RELand Qr

In BL coding, the quantizer step-size is speciﬁed for each DCT coeﬃcient in a macroblock(MB) via a quantization matrix of 64 elements, denoted as quant_matrix.

The other important parameter is the quantization scale factor mquant. The quantizer step-size, used for quantizing the ith DCT coeﬃcient in each of the blocks in a particular MB, is given by quant_matrixi· mquant. Thus, according to uniform quantization theory, amplitude of quantization error is distributed over the range

½quant matriximquant

2 ;quant matriximquant

2 . To express the quantization error over the whole BL picture coding, the range is rewritten as

Q_BL quant matrixi

2 ;Q_BL quant matrixi

2

: ð2Þ

It is the quantization noise of BL that EL is going to encode, so the source data for EL to quantize are distributed over the range(2), which is an important character- istic of EL coding. It shows us that there must be a strong relationship between QBL

and how to determine our expected QEL. We can see the probability distribution in Fig. 3 that reflects the AC coefficients of the BL and EL using the Stefan and Mobile&Calendar sequences. The AC coefficients in BL are usually modelled as a Laplacian distribution (Lam and Goodman, 2000). After BL quantization, the amplitude of the AC coefficients in EL will be within the range of(2). It is observed that in most cases, PDF of the AC coefficients in EL still has a similar shape to that of BL, which is also affected by QBL.

In general, analytic rate control approaches for non-scalable coding are based on analyzing the PDF of DCT coefficients and quantization with a uniform scalar quantizer of step size Q, the difference frame rate is estimated as R(Q) = k Æ H(Q), where H(Q) is the empirical entropy of the Q-quantized coefficients and k is a empirical constant, caused by multiplicative factors including the mismatch of different entropy computation based on the ideal and practical DCT coefficients distribution and other entropy coding methods after quantization such as run-length coding (RLC) and variable-length coding (VLC) (Hang and Chen, 1997). According to general high-bitrate coding theory and uniform quantization theory, we use the approxima- tion(3)inRibas-Corbera and Neuhoff (1996)for computing different frame entropy of EL, denoted as H(Q) or R(Q), where r² is the variance of the difference frame pixels.

(6)

And we use(4)inGersho and Gray (1992)to measure distortion (D) using mean square error (MSE) criterion

RðQÞ ¼1

2log₂2e²r²

Q²; ð3Þ

D¼Q²

12: ð4Þ

For EL coding, the distortion within the BL is the source for EL to encode. There- fore, the variance of the EL diﬀerence frame pixels, denoted as r²_EL, is the distortion of BL measured by MSE. Thus,(4) can be rewritten as(5).

r²_EL¼Q²_BL

12 : ð5Þ

According to traditional information theory, the RD function is expressed by (6), where r² is the variance of the data source and the factor ² is dependent

Fig. 3. Probability distribution of AC (0,1) coeﬃcients: (A) and (C) the BL and EL in Mobile & Calendar, respectively; (B) and (D) the BL and EL in Stefan, respectively.

(7)

on the PDF of data source as well as the type of encoding used (Jayant and Noll, 1984)

DðRÞ ¼ ²r²2^2R: ð6Þ

Applying the expressions(4)–(6)to EL coding, we can get a group of formulas(7).

Thus, we can get the approximate expression for the EL frame entropy(8)from(7), as shown inFig. 4

DELðRELÞ ¼ ²r²_EL2^2R^EL; r²_EL¼^Q₁₂²^BL;

DEL¼^Q₁₂²^EL; 8>

><

>>

:

ð7Þ

R_ELðQ_ELÞ ¼ log₂Q_BL

Q_ELþ log₂: ð8Þ

The distortion of BL can be reduced only when Q_EL< Q_BLand Q_ELis greater than 1. Therefore, Qr is in the range [1,QBL]. Moreover, in practice Qr varies in a small range because either too large or too small quantization parameters are seldom used.

Thus, formula(8)can be simpliﬁed to:

R_ELðQ_ELÞ / ðQ_r 1Þ: ð9Þ

In the experiments, we use the test sequences, Mobile & Calendar, Boating, and Stefan, to verify the relationship shown in formula(9). Set a fixed mquant of BL equal to 40. We encode the first frame of the above sequences with different Qr. The relationship between actual frame entropy of EL and Qr is shown in Fig. 5.

And the correlation coefficient, expressed as(10), is used to estimate the relationship, where Cov(x,y) is the covariance and Var(x) and Var(y) is the variance. For different pictures, the correlation coefficients are 0.9998, 0.9981 and 0.9994, respectively,

Fig. 4. Plot of RELversus Qr.

(8)

which implies a linear relationship exists between them. Then the aim of our control algorithms for EL turns from traditionally looking for a suitable QELto looking for a suitable Qr to get our required QEL

q_xy ¼ Covðx; yÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðxÞ

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðxÞ

p : ð10Þ

3.2. Relationship between RELand PNZ

The q-domain rate control algorithms for non-scalable coder have been proposed according to the approximate linear relationship between the ultimate bitrate and the percentage of zero-value quantized coeﬃcients (He and Mitra, 2002a; He et al., 2001). We ﬁnd that this linear relationship also exists in EL coding.

REL, as expressed in(11), is composed of AC component, denoted as RAC, and generalized header and syntax bits for the frame, denoted as C which is a relatively constant number. Note that all the coeﬃcients in EL are coded by non-intra VLC table and there is no DC component because it is of a diﬀerentiated nature (Informa- tion Technology and ISO/IEC, 1995).

REL¼ RACþ C: ð11Þ

An example of the relationship between RACand PNZis plotted inFig. 6, where BL is encoded with a fixed mquant equal to 50. We useEq. (12)to express the approximate linear relationship, where K_R is the slope and its characteristics will be discussed in the following paragraph. On closer observation ofFig. 6we can see that the approximate linear relationship do not hold for PNZwith small values. In most EL coding scenarios, PNZtakes a small value at frame level, e.g. 0.05 or so, and will also vary significantly in different MBs. This motivates us to investigate the RACand PNZrelationship versus different PNZ. InFig. 7, we plot the correlation coefficients at different PNZ. It is noticed that correlation coefficient drops significantly for lower values of PNZ, thus indicating that the linear rate model needs to be improved fur- ther for EL coding

R_AC¼ KR PNZ: ð12Þ

Fig. 5. Relationship between RACand Qr.

(9)

For non-scalable coding, variation of KRis small (He and Mitra, 2002a; He et al., 2001) whereas for EL coding, it varies as shown in Fig. 8, where plots in the top row are the relationship between KR and PNZ and plots in the bottom row show the relationship between KR and RAC when BL is encoded with a ﬁxed mquant equal to 50. It can be seen from Fig. 8 that KR is very much a constant only in the range of larger PNZ, which corresponds to a high bitrate and beyond the bitrate used in a typical video scenario. KR also exhibits a steep slope for smaller PNZrange (corresponding to lower bitrate range), which is an obstacle to realizing

Fig. 6. Plot of RACversus PNZ.

Fig. 7. Plot of correlation coeﬃcients versus PNZ.

(10)

a simple rate control strategy. From (12), we can understand that KR is simply a statistical parameter and its value indicates the average bit number used for each non-zero coefficient; because in a hybrid DCT and entropy coder, bits are mainly assigned for coding non-zero coefficients. When non-zero coefficients occur with a very small probability, it does not match the optimal cases that entropy techniques are designed to handle, and KR will show a steep increase in the range of lowest PNZ as shown Fig. 8. In EL coding, PNZ usually has a small probability in the frame level, and at the same time, KR and PNZ varies greatly in different MBs.

It is not easy to establish an accurate and simple expression to describe the relationship between them. Therefore, we will treat it as different constant in different classes by a MB-classification strategy which will be described in Section 4.

3.3. Relationship between Qrand PNZ

Another important observation is the strong correlation between Qr and PNZ. Using the same test sequence and encoding strategy as above,Fig. 9depicts the relationship of three sets of gathered data.

The correlation coeﬃcients between Qrand PNZfor sampling pictures of diﬀerent sequence are 0.9975, 0.9984 and 0.9968, respectively. Therefore, an approximate linear expression(13)can be used to model such a relationship, where KQis the slope.

We can also see the linear relationship fromFig. 9

Q_r¼ 1 þ KQ PNZ: ð13Þ

Fig. 8. Plots of KRversus PNZand KRversus RAC.

(11)

4. Optimum bit allocation scheme for EL coding 4.1. Theoretical model for EL optimum bit allocation

In this section, we will look at the distortion model for EL coding and propose a practical OBA scheme for EL coding at MB-level.

4.1.1. Distortion model for EL coding

Without accurate distortion and rate model, the OBA cannot be carried out (Ding and Liu, 1996). Naturally, if we want to achieve better coding performance by OBA, an accurate RD model for EL coding is needed.

InHe and Mitra (2002b), a related q-domain distortion model has been developed and that motivates us to study the relationship between distortion (measured by MSE) and PNZ, and to see whether it is suitable for EL coding. Let D₀¼ DEL=r²_EL¼:

D_EL=D_BL be the normalized distortion, where r²_EL is the variance of EL source and is also equal to the distortion of BL, DBL; and DELis the distortion of EL. We plot D0versus PNZwhen BL is encoded in a ﬁxed scheme and EL is encoded with diﬀerent quantizer, as shown inFig. 10. It can be observed that we can still use a exponential function to describe the relationship between DELand PNZ, therefore, we use the distortion model proposed inHe and Mitra (2002b)to denote the EL distortion, expressed in(14), where a is a statistical parameter. Note that a takes a larger value compared with its value in non-scalable coding and PNZis usually small, e.g. about 0.05 in the frame level of EL coding

D_ELðPNZÞ ¼ r²_ELe^aP^NZ: ð14Þ

4.1.2. Theoretical optimum bit allocation for EL coding

OBA is performed to assign detailed number of bit to each data source in order to minimize the overall distortion and achieve the best quality. This is usually solved by Lagrange theory, formulated as

F ¼ minðD þ kRÞ: ð15Þ

Fig. 9. Relationship between Qrand PNZ.

(12)

With rate(12)and distortion model(14)for EL coding, we can derive the theoretical OBA scheme (He and Mitra, 2002b). Let {Si|1 6 i 6 L} be the input source and RT be the target bit number. The problem can be addressed as (16) for each data source, where Ni is the size of source Si; and the optimal solution for Si is given in (17)

D_EL_i¼ r²_EL_i e^aⁱ^P^NZi Ni; RELi ¼ KRi PNZi Ni; F ¼ min

P_NZi

P^L

i¼1

r²_EL_i e^aⁱ^P^NZi Niþ k P^L

i¼1

K_R_iP_NZ_iN_i RT

; 8>

>>

><

>>

:

ð16Þ

R_EL_i ¼

n_iN_i R_T Ps^L_j¼1n_jN_jln^r

2 ELj nj

PL

j¼1n_jN_j þ n_iN_ilnr²_EL_i

n_i n_i¼K_R_i ai

: ð17Þ

4.2. Challenges for practical optimum bit allocation for EL coding

The above result is only a theoretical solution, when applying it to EL coding;

there are some constraints and challenges to overcome.

Fig. 10. Distortion curves of each frame in the sequences.

(13)

4.2.1. Negative allocated bit number

First of all, one constrain is neglected: RiP0. In practice, it is meaningless to assign negative bit number to the data source. However, the error cannot be ignored if we simply modify the negative solutions of(17)to zeros. For instance, in MB-level OBA, quite a lot of MBs are assigned negative bit number resulting from the larger variance ratio among MBs in EL coding. We can see the variance distribution and corresponding theoretically assigned bit number at the MB level by the example shown inFig. 11, one frame of Mobile & Calendar sequence when BL and EL both are encoded at a target bitrate 3 Mbit/s. If the allocated negative bit number is simply set to zero, there will be a larger mismatch between actual allocated bit number and target bit number.

4.2.2. Parameters KRand a

Second, when we perform OBA according to the formula(19), there is one uncertain key factor, namely n_i¼ KRi=a_i. The property of K_Rhas been discussed at the end of Section 3.2 and is shown inFig. 8. As seen inFig. 12, diﬀerent value of KRin the frame level exists when encoding a Mobile & Calendar frame at diﬀerent bitrate.

Thus, for EL coding both KRand a are not nearly constant and, therefore, diﬃcult to formulate, which is a barrier to perform bit allocation and decide the EL quantization parameter. Even if we use some mathematical tool to approximately estimate them, the high complexity and latency threat to the stability of our rate control

Fig. 11. Example of the MB variance distribution and the corresponding assigned bit number after MB- level OBA.

(14)

algorithm will be introduced. In the following paragraph, we will present our schemes to deal with such challenges.

4.3. Practical optimum bit allocation schemes for EL coding in MB-level 4.3.1. Re-optimization strategy

Note that there is one distinct advantage of EL coding that MBs can be skipped arbitrarily and frequently, if necessary, because of its enhancement nature. Then, when performing practical bit allocation, we ﬁrst skip these MBs with negative allocated solutions. FromFig. 11, it is obvious that quite a few MBs are allocated a negative number of bits and there will be large bit number mismatch when we simply set negative values to zeros. To decrease such a mismatch, we perform re-optimization by reassigning the target number of bit allocated for the frame among those MBs allocated positive bit number after the ﬁrst optimization. Then the distribution of allocated number of bit will be smooth; in the following step, we can continue to skip MBs with negative solutions after the second optimization; and at the same time we skip the MBs with smallest variance among those with positive allocated bit number to make up for the mismatch.

4.3.2. MB classiﬁcation

According to our observation, both KR and a varies for diﬀerent coding source and strategy. We classify the MBs by the measure of variance, denoted as fr²_ij1 6 i 6 ng and the mean of variance, donated as r²¼ ð1=LÞPL

i¼1r²_i. For lower computational complexity and implementation cost, all the MBs are classiﬁed into four classes, and the classiﬁcation method is shown as follows:

C1 :fMBijr²_i 6^r²

2g;

C2 :fMBij^r₂²<r²_i 6 r²g;

C3 :fMBij r²<r²_i 62 r²g;

C4 :fMBijr²_i >2 r²g:

8>

>>

><

>>

:

ð18Þ

Note that OBA is mainly decided by the MB variance, thus MBs of class C1 are usually allocated negative number of bits and skipped according to expression (17).

Fig. 12. Plot of a versus bitrate using Mobile & Calendar sequence.

(15)

After classiﬁcation, each class can be looked on as an independent input source, in which all MBs can be assumed to have similar mathematical property and coding behaviors, and MBs belonging to the same class can be treated as having the same KRand a in that class level, which can be estimated by the following rewritten expressions of(12) and (14)

K_R_i ¼RAC_i

P_NZ_i; ð19Þ

ai¼ 1 PNZ_i

lnr²_EL_i DEL_i

: ð20Þ

5. Rate control

In this section, we will describe our rate control scheme that can be used for a two- layered MPEG-2 SNR encoder.

5.1. Rate model

According to the above analysis, we ﬁrst derive our rate model. First, combining Eqs. (1) and (12), we can deriveEq. (21), which can be substituted intoEq. (12)to derive our rate control model(22), where QBLP QEL

PNZi ¼Q_r 1

K_Q ; ð21Þ

R_EL¼ C þ KRQ_BL=Q_EL 1

K_Q : ð22Þ

5.2. Frame level rate control

The frame level assigns a target number of bits for the current frame to encode.

We simply use(23)for selecting the target number of bit for the frame, where R de- notes the bitrate, F is the frame rate, and Bf(24)is a buffer feedback factor (ISO/IEC JTC1/SC29/WG11,N3908, 2001), B is the current buffer level and Bs is the buffer size, set as 4 Æ R/F

T ¼ max R F Bf; R

4F

; ð23Þ

B_f ¼Bþ 2 ðBs BÞ

2Bþ ðBs BÞ : ð24Þ

(16)

Step1.1 Computation of parameters

First, compute MB variance, r²_i; second, classify MBs into diﬀerent classes, Class j, by(18); third, compute the parameters KR_j, ajand KQ_jfor each Class j according to the former frame, and for the ﬁrst frame, default values are chose; fourth, let KR_i, ai

and KQ_ifor diﬀerent MBs be equal to KR_j, ajand KQ_jof the class they belong to. In practice, MBs in Class1 and part of Class2 are usually skipped after OBA, and some parameters may not be possible to estimate and they are set to the default constant accordingly.

Step 1.2 Optimum bit allocation

With r²_i, KR and ai, we can perform OBA by(17) and our re-optimization scheme (Section 4.3), to compute the target bit number for each MB.

Fig. 13. Flowchart of rate control at MB-level.

(17)

[Step 2] Compute QELfor ith Macroblock

First, check the allocated number of bit RAC_ifor ith MB, if RAC_i<0, then skip the ith MB, else compute PNZ_iby(25).

P_NZ_i ¼R_AC_i

K_R_i : ð25Þ

Second, compute Qrby(13), so we can obtain the desired QEL_iand then encode the MB.

[Step 3] Update model parameters for each class

After encoding the ith MB belonging to Class j, compute the KR_jby(19)and KQ_jby (13)for Class j according to the former encoded MBs belonging to Class j ; then update the KR_jand KQ_j.

6. Experimental results

In this section, we will apply our approach on a SNR scalable encoder, which is based on the non-scalable MPEG-2 coder,¹to verify the performance of our rate control algorithm. We test our rate control algorithm to control the EL bitrate with the BL encoded either in VBR mode or in CBR mode (TM5 rate control). We also compare our proposed algorithm with TM5 rate control when applying them to EL compression.

MPEG-2 SNR scalable encoder structure we employ is the same as that shown in Fig. 2. The raw sequences used in the test are as follows: (a) Boating (720· 576 4:2:0), 100 frames, (b) Mobile & Calendar (720· 576 4:2:0), 100 frames, shown in Fig. 14. In our experiments, the length of GOP (group of pictures) is 15, and the dis- tance between I frame and P frame is 3.

Tables 1 and 2show you the PSNR performance comparison between our proposed algorithm and TM5 rate control. In Table 1, the BL is encoded in VBR mode and at the same time EL is encoded at diﬀerent EL bitrate, respectively.Ta- ble 2 shows the case that the BL is encoded in CBR mode (employing TM5 rate control) and at the same time EL rate control is performed by our proposed algorithm and TM5 rate control, respectively. They both show that our rate control algorithm can achieve a higher average PSNR gain compared with TM5 rate control model. Examples of PSNR comparison for each frame are plotted inFigs. 15 and 16.

In the following, we show you the comparison of MSE at the MB-level. As show inFig. 17, we take one frame in the Mobile & Calendar sequence and plot the MSE comparison among EL source frame, reconstructed frame after TM5 rate control and reconstructed frame after our proposed algorithm, where the order of MB is

1 Seehttp://www.mpeg.org/mpeg/mssg/#source

(18)

sorted by MSE of MB in an ascending order. It can be observed that the gradient of our proposed algorithm is gradual, this implies that the picture will appear to be smoother and quality will be constant compared to the one controlled by TM5.

Fig. 14. Raw sequences.

Table 2

Comparison when the BL is encoded in CBR model Video sequences BL bitrate (bit/s)

(CBR)

BL PSNR EL bitrate (bit/s) BL&EL average PSNR TM5 Proposed Gain

2M 29.502 30.339 0.837

Boating 3M 27.975 3M 30.130 31.371 1.241

(mquant =50) 4M 30.654 32.265 1.611

2M 28.879 29.733 0.854

Mobile & Calendar 3M 28.314 3M 29.224 30.459 1.235

(mquant =50) 4M 29.622 31.013 1.391

Table 1

Comparison when the BL is encoded in VBR model Video sequences BL bitrate

(bit/s) (VBR)

BL PSNR EL bitrate (bit/s) BL and EL average PSNR TM5 Proposed Gain

2M 30.968 31.488 0.520

Boating 3.9M 29.974 3M 31.474 32.332 0.858

(mquant = 50) 4M 31.968 33.095 1.127

2M 27.755 27.979 0.224

Mobile & Calendar 4.3M 27.097 3M 28.070 28.485 0.415

(mquant = 50) 4M 28.427 29.022 0.595

(19)

Fig. 15. PSNR performance comparison between TM5 and our proposed algorithm for EL coding when BL is encoded in the CBR mode and when EL is encoded at 3 Mbit/s.

Fig. 16. PSNR performance comparison between TM5 and our proposed algorithm for EL coding when BL and EL are encoded at 3 Mbit/s, respectively.

Fig. 17. Comparison of the MSE of the reconstructed frame in MB-level.

(20)

the bitrate and Qr. Second, we find that the conventional linear rate model, proposed for non-scalable coding, needs to be improved. Specifically, there is significant variation of the gradient for EL coding and the proposed scheme in the paper have dealt with such a problem. Third, because the rate control parameters are not stable for robust video compression, we propose a MB classification method to perform EL OBA. Fourth, we propose a re-optimization scheme for EL OBA to decrease the rate control error. In addition, other related useful relationships are integrated to achieve our EL rate control framework. Experiments results prove that our rate control algorithm performs well for SNR EL coding and shows an improvement when compared with the classical TM-5 rate control.

References

Arnold, J.F., Frater, M.R., Wang, Y., 2000. Eﬃcient drift-free signal-to-noise ratio scalability. IEEE Trans. Circuits Syst. Video Technol. 10, 70–82.

Chiang, T., Zhang, Y.-Q., 1997. A new rate control scheme using quadratic rate distortion model. IEEE Trans. Circuits Syst. Video Technol. 7, 246–250.

Ding, W., Liu, B., 1996. Rate control of MPEG video coding and recording by rate-quantization modeling. IEEE Trans. Circuits Syst. Video Technol. 6, 12–20.

Gallant, M., Kossentini, F., 2001. Rate-distortion optimized layered coding with unequal error protection for robust internet video. IEEE Trans. Circuits Syst. Video Technol. 11, 357–372.

Gallant, M., Kossentini, F., 1999. Eﬃcient scalable DCT-based video coding. In: Proceedings of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering, vol. 2, Shaw Conference Center, Edmonton, Alberta, Canada, pp. 9–12.

Gersho, A., Gray, R.M., 1992. Vector Quantization and Signal Compression. Kluwer Academic Publishers, Morwell, MA.

Ghanbari, M., 1999. Video Coding an Introduction to Standard Codecs. The Institution of Electrical Engineers, London, UK.

Hang, H.-M., Chen, J.-J., 1997. Source model for transform video coder and its application. I.

Fundamental theory. IEEE Trans. Circuits Syst. Video Technol. 7, 287–298.

He, Z., Mitra, S.K., 2002a. A linear source model and a uniﬁed rate control algorithm for DCT video coding. IEEE Trans. Circuits Syst. Video Technol. 11, 970–982.

He, Z., Mitra, S.K., 2002b. Optimum bit allocation and accurate rate control for video coding via-domain source modeling. IEEE Trans. Circuits Syst. Video Technol. 12, 840–849.

He, Z., Kim, Y.K., Mitra, S.K., 2001. Low-delay rate control for DCT video coding via-domain source modeling. IEEE Trans. Circuits Syst. Video Technol. 11, 928–940.

Information Technology, ISO/IEC, 1995. Generic coding of moving pictures and associated audio information: Video.

ISO/IEC JTC1/SC29/WG11, N3908, 2001. MPEG-4 Video Veriﬁcation Model 18.0, January.

Jayant, N.S., Noll, P., 1984. Digital Coding of Waveforms Priciples and Applications to Speech and Video. Prentice-Hall, Englewood Cliﬀs, NJ.

(21)

Lam, E.Y., Goodman, J.W., 2000. A mathematical analysis of the DCT coeﬃcient distributions for images. IEEE Trans. Image Processing 9, 1661–1665.

Lee, B., Park, K., Hwang, J., 1997. H.263-based SNR scalable video codec. IEEE Trans. Consumer Electron. 43, 614–622.

Miloslavsky, E., Zakhor, A., 1999. Rate control for layered video compression using matching pursuits.

In: Proceedings of Image Processing, ICIP 99, vol. 2, 1999, pp. 357–361.

MPEG-2 Video Test Model 5, ISO/IEC JTC1/SC29/WG11, 1993. MPEG93/457, April.

Ribas-Corbera, J., Lei, S., 1999. Rate control in DCT video coding for low-delay communications. IEEE Trans. Circuits Syst. Video Technol. 9, 172–185.

Ribas-Corbera, J., Neuhoﬀ, D., 1996. On the optimal motion vector accuracy for block-based motion- compensated video codersProceedings of the IS&T/SPIE Dig. VideoComp.. Alg. & Tech., San Jose, CA, pp. 302–314.

Tao, B., Dickinson, B.W., Peterson, H.A., 2000. Adaptive model-driven bit allocation for MPEG video coding. IEEE Trans. Circuits Syst. Video Technol. 10, 147–157.

Wang, L., 2000. Rate control for MPEG video coding. Visual Communication and Image Processing, SPIE, vol. 2501.

Wang, Y., Ostermann, J., Zhang, Y.-Q., 2001. Digital Video Processing and Communication. Prentice- Hall, Englewood Cliﬀs, NJ, Ch. error control in video communication.

Wilson, D., Ghanbari, M., 1999. Optimization of MPEG-2 SNR scalable codecs. IEEE Trans. Image Processing 8, 1435–1438.

Wu, D., Hou, Y.T., Zhu, W., Zhang, Y.-Q., Peha, J.M., 2001. Streaming video over the Internet:

approaches and directions. IEEE Trans. Circuits Syst. Video Technol. 11, 282–300.