Video Encoding: A Case Study - Estimating MTT WSS

5.3 Estimating MTT WSS

6.1.3 Video Encoding: A Case Study

We next evaluated the performance of real-time MPEG-2 video encoding applications when using our heuristics by emulating the motion estimation portion of the encoding within SESC. Motion estimation is the most compute- and memory-intensive portion of MPEG video encoding. We emulated motion estimation in our experiments by mimicking its potential memory reference pattern. As discussed in Chapter 1, as the core counts of multicore platforms increase, and the processing power of individual cores remains similar (or even decreases), most compute-intensive applications such as video encoding will need to become multithreaded to continue to achieve performance gains. Such performance gains are mandatory if the video quality demanded by users continues to increase.

We mimicked a potential memory reference pattern of multithreaded motion estimation by splitting each video frame into identically-sized horizontal slices, each of which is processed by a different task of the same MTT. All motion estimation requires a search—for each task in an MTT, this involves searching a memory region that includes both its assigned slice and several nearby slices, such as the slices immediately above and below its assigned slice. In our experiments, tasks reference the memory region of their assigned slice first, and then search

(a) (b) (c) 1280 x 720 pixels

Each slice is 1280 x 90 pixels

Task 7 slice Task 4 slice Task 6 slice Task 5 slice Task 3 slice Task 2 slice Task 1 slice Task 0 slice Task 4 slice 0 1 2 3 4 1 2 3

Figure 6.1: The memory reference pattern for multithreaded motion estimation. The insets show (a) a 720p HDTV video frame; (b) the same frame divided into eight slices, each processed by a different task of the same MTT; and(c)the search pattern for the task that is processing the fifth slice (the numbers on the right-hand side of each slice indicate the order in which the slices would be incorporated into the search).

progressively more distant slices. It is often desirable to search the largest space possible (to approximate an exhaustive search), so we assumed that tasks were backlogged in that they continue the search until either their execution time or search space is exhausted. The memory regions that are referenced by each task in an MTT overlap more as the size of the region searched per task increases.

Example (Figure 6.1). As an example, consider Figure 6.1, which concerns a 720p HDTV video, containing frames of 1280 x 720 pixels (900K per frame assuming one byte per pixel), as shown in inset (a). This video could be divided into eight slices of size 1280 x 90 (112.5K per frame), as shown in inset (b), each of which is processed by a different task in the same MTT. The search pattern for the task that is processing the fifth slice, assuming eight slices and a corresponding eight-task MTT, is shown in inset (c). In inset (c), the numbers on the right-hand side of each slice indicate the order in which the slices would be incorporated into the search.

Both GEDF _{scheduling and the heuristic that performed best in Section 6.1.2 (cache}

utilization threshold of 0%, cache-aware policy (1), lost-cause policy (1), and phantom tasks) were used to schedule task sets on the eight-core architecture. MTTs were generated according to the video quality level of the video that they represented. These levels define resolutions and frame rates that are typical for real applications, some of which are more demanding than others. Table 6.5 presents these levels and their corresponding MTTs, along with a use

Level Resolution FPS WSS Task Count Period Use in Practice

1 1920 x 1080 30 2025K 8 33 1080p HDTV (high-quality), moderate frame rate 2 1920 x 1080 30 2025K 5 33 1080p HDTV (high-quality),

moderate frame rate 3 1280 x 720 60 900K 8 16 720p HDTV (mid-quality),

high frame rate 4 1280 x 720 60 900K 4 16 720p HDTV (mid-quality),

high frame rate 5 720 x 480 30 338K 1 33 standard TV and DVD 6 352 x 288 30 99K 1 33 video conferencing 7 320 x 240 24 75K 1 41 high-end portable devices 8 176 x 144 15 25K 1 66 low-end portable devices

Table 6.5: Video quality levels and their corresponding MTTs. The column labeled “FPS” indicates the frames-per-second of the video, and the last column provides one use for each video quality level in practice. All tasks have an execution cost of one—jobs are expected to be backlogged.

for each video quality level in practice.3 _{Note that the only difference between levels 1 and} 2, and levels 3 and 4, is the number of tasks in the MTT that processes each video frame.

Video encoding task sets were randomly generated according to the following methodology. System utilization was either 50% or 100%, and the video quality levels for the MTTs in each task set were uniform over [1,8], [1,6], [7,8], or [1,4]. Since there is little freedom when choosing task parameters for the MTTs in these task sets, only 10 task sets were generated for each combination of system utilization and video quality level.

All results are shown in Table 6.6. In almost all cases, the tested heuristic outperformed

GEDF_{, resulting in an average 10.65% increase in IPC over} GEDF _{in all experiments. Note}

that an increase in IPC can allow for a proportionate increase in the number of videos or clients supported by the platform, an increase in the space searched for each video during motion estimation (to improve encoding quality), or upgrades in the quality level of some videos.

In document On the design and implementation of a cache-aware soft real-time scheduler for multicore platforms (Page 141-143)