Experimental Setup - Developing a high definition video quality of experience model based on in

best quality. In H.264, quantization is controlled by the QP, which ranges between 0 and 51. An equivalent quantizer step size can be calculated for each QP. Step size approximately doubles for every increase of 6 in QP.

Noise Factors

Noise factors considered for this experiment were motion, complexity and location (indoor or outdoor scene). Table 4.2 shows the combination of noise factors. Taguchi DoE requires that each condition i.e. a specific combination of all 4 control parameters should be repeated for all noise factors. We had 3 noise factors so we were looking at 23 combinations of noise. It was required that we repeat the experiment for each condition with 8 noise combinations. Two cases were ignored i.e. when the motion and complexity both were high and both were low, and location was not accounted for. We ignored location in both of these cases because once motion and complexity are both high or both are low; location will not affect the quality. Figure 4.1 shows the possible combinations after ignoring these two cases. HHI and HHO effectively became HHI whereas LLI and LLO became LLI.

Table 4.2: Noise Factors and Levels. Noise Factors and Levels Noise Parameters Level 1 Level 2

Motion High Low

Complexity High Low

Location High Low

4.3 Experimental Setup

Content domain experiment required a test bed and video sequences. The following sections talk about the test bed, video sequences, test layout, user task and any customization that was required for content domain experiment.

4.3.1 Test Bed

Computers used for this experiment were Intel core i5 CPU machines running at 3.60 GHz with 4 GB RAM. Running 64-bit Windows 7 enterprise with Service Pack 1 installed. These machines were using integrated Intel HD graphics. Monitors used were ViewSonic VS 13239 LED 1080p Full HD.

Figure 4.1: Combination of noise factors Table 4.3: Array with control factors.

L9 Array with control factors Exp. No A B C D 1 200 0 12 5 2 200 2 26 3 3 200 4 51 1 4 800 0 26 1 5 800 2 51 5 6 800 4 12 3 7 1200 0 51 3 8 1200 2 12 1 9 1200 4 26 5

4.3.2 Selection of Test Sequences

We extracted benchmark HD quality videos from Blu-ray movies supporting H.264 compression and full HD resolution of 1920 * 1080. The content was selected from different movies encompassing many different genres. The degraded videos were cre- ated from these benchmark videos by configuring H.264 compression with different parameter settings. Altogether, 54 clips were used for the experiment and there were 6 categories of content type. For each condition as per Table 4.3, every observer was shown a group of 6 videos. This was for fulfilling the requirement of Taguchi DoE so that the experiment is repeated for each noise factor.

4.3.3 Test Layout

After selecting an orthogonal array L9 containing 4 columns, one for each of 4 control factors, the motion, complexity and location were considered as noise factors. Table 4.3 shows the L9 array with control factors. We randomized the whole experiment

4.3. EXPERIMENTAL SETUP 61

process so that all the videos were presented to the viewers randomly. A group of 16 observers volunteered to participate in the experiment. They were screened to confirm that they had no prior experience in video compression or production. Each observer was shown 54 clips where each clip was 10 seconds long. The hardware playing these videos was set up in such a way that the volunteers were not required to move from their seat. The distance and height from the screens were adjusted accordingly. Larger screens were placed further away from the viewers than smaller screens. For the test setup, the [87] recommendations were followed closely. Undistorted/benchmark video was played followed by 3 seconds of delay and then the distorted video was played. Approximately 5 seconds were given for user feedback selection. Feedback was given by assigning a number between 1 and 5 where 1 represented the lowest quality and 5 being the highest quality. The users were required to select a level of quality, based on their perception, between excellent, good, fair, poor and bad. As per their selection, a relative score was recorded. Approximately 30 seconds was required for completing one assessment. After 6 assessments a break of 3 minutes was given. This completed one segment. After 6 segments, instead of 3 minutes, a break of 5 minutes was allowed. The whole experiment was completed within approximately 45 minutes. Whereas initially 5 minutes were spent in explaining the experiment procedure whilst later on 10 minutes were consumed in filling in the feedback form.

4.3.4 User Task

Observers were required to assess 54 videos in approximately 45 minutes. Their task was to assess each degraded video in comparison to the original/undistorted video. They were only briefed about the experimental process and were not trained about artefacts and the 5 level Likert scale used for this experiment. They were asked to enter their feedback about the perceived quality. At the end of the experiment they were asked to fill in a feedback form, containing questions about the experimental process and artefacts presented in the experiment. The findings will be discussed in Section 4.4.

4.3.5 Customization

From earlier work, [74] [75] [76] we realized that to record the true perception of an observer, we needed to reduce the effects of boredom as well as the memory effect and recency effects. In each assessment, playing a benchmark/undistorted video ensured that for each assessment the user had a recent reference in mind to compare the quality and thus minimized the memory effect on users. The experiment was designed to present conditions randomly hence users had no perception or expectation of the level of quality for the next assessment based on the current assessment, [101]. This reduced

the effect of either step-wise increasing or step-wise decreasing quality levels. Without randomization the user would be scoring on expectations of quality rather than a true perception of quality. A few reasons that could have contributed towards introducing boredom in a user are: length of the experiment, length of the selected video and the content displayed. In order to reduce boredom we made an effort to keep the time required for the experiment to less than an hour. In addition, we made an effort to keep the experiment interesting and exciting so that the boredom effect does not come into play. Forgiveness effect was reduced by keeping the video length to 10 seconds and playing the content randomly. Seferidis introduced this term of “Forgiveness effect” which related to the phenomena where user tend to forgive impaired video when it is followed by a substantial period of high quality or unimpaired video [74] [102]. This ensured that the observers were only rating the immediate quality rather than suffering from the forgiveness effect. For this experiment we denied the user any control over the flow of the experiment and they were not given any functionality to select the genre of video.

In document Developing a high definition video quality of experience model based on influential parameters : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North (Page 73-76)