• No results found

A number of both open source and closed source tools is available to process multimedia content in a centralized way and some of them also in a distributed way. To author’s knowledge with the only exception discussed below, none of them has been integrated into the Grid environment and none of them has support for distributed storage (unless it is emulated by operating system as a local filesystem). The most important tools and approaches are listed in this section.

7.2.1 Grid and Distributed Storage Infrastructure

There are numerous projects aimed at building computational Grid infrastructure for var- ious purposes in U. S. A., Europe and Japan. These projects are ranging from specific- purpose built Grids for selected applications to general infrastructure projects. As a part of the Grid infrastructure these projects provide large computational power in the form of Linux PC clusters that are being rapidly expanded each year because of the cost-efficiency of this solution which we target our attention on. The Grid activities are covered by METACenter project [78] in the Czech Republic.

7.2.2 Video Processing Tools

Currently a lot of tools is available for multimedia transcoding (list of freely available tools can be found e. g. in [69] and [70], description of most important commercial tools can be found in [20])—some of them being open source and some of them closed-source—and vast majority of them doesn’t allow distributed encoding, while some allow for distributed processing in homogenous environments with some centralized shared storage capacity.

Due to its highly modular architecture, thetranscode[55] tool supports transcoding

between almost all common video distribution formats except for those for which no open- source freely available implementation of encoding library exist. It also allows advanced processing that is needed like high quality de-interlacing and down-sampling. There is a simple computation distribution available based on PVM [89] and shared filesystem sup- ported in underlying operating system (typically NFS). Thetranscodetool can also be

used as video and sound acquisition tool on Linux using Video4Linux interface.

Other tool available for general multimedia transcoding is MEncoder which is part of MPlayer software suite [81].

However none of above mentioned tools support transcoding to RealMedia format, which is currently one of the most popular formats for video streaming delivery. The company producing RealMedia encoder decided to provide source code of all applications for research and development purposes under Helix Community Project [72]. Thus it is possible to explore integration of this format into asynchronous encoding environment.

7.2.3 Automated Distributed Video Processing

Shortly after our Distributed Encoding Environment, there has been another system pub- lished based on similar ideas called “A Fully Automated Fault-tolerant System for Dis- tributed Video Processing and Offsite Replication” [44]. This system uses similar overall architecture to ours based on parallelizing the encoding in a distributed computing envi- ronments by splitting the encoding into smaller chunks that are encoded in parallel.

As the system doesn’t use distributed file system with replica support, it handles the data replication using Condor-related tools Stork [46] and DiskRouter [45]. Furthermore, the claimed fault-tolerancy is understood and handled only on job scheduling level and the system actually demonstrates fault-tolerancy of the Condor-G [51] scheduling system which it is based on.

Chapter 8

Distributed Encoding

Environment

In recent years there has been a growing demand for creating video archives available on the Internet ranging from archives of university lectures [90, 67], archives of medical videos and scientific experiments recordings to business and entertainment applications. Building these media libraries requires huge processing and storage capacities.

In this chapter, we describe a system called Distributed Encoding Environment (DEE) [39] that is designed to utilize Grid computing and storage infrastructure. This chapter is organized as follows: in Section 8.1 we propose architecture of the DEE, in Section 8.2 suit- able data and processor scheduling algorithms are analyzed, Section 8.3 briefs prototype implementation and evaluates its performance.

8.1 Model of Distributed Processing

There are two possible approaches to building a distributed video processing system dif- fering in granularity of the parallelization:

1. parallelization on level of compression algorithm, i. e. fine-grained parallelization of the compression algorithm itself,

2. parallelization on data level, i. e. coarse-grained parallelizing the whole encoding process by splitting the material and encoding the resulting parts in parallel. While the former option is suitable for semi-asynchronous processing like live streaming, it adds significant overhead and almost prevents reasonable linear scalability on distributed processing infrastructure without single shared memory because of several reasons. It usu- ally involves substantial synchronization among the distributed processes—e. g. I-frames need to be handled before processing of P- and B-frames can occur. It also requires move- ment of source data to the processing node just before the calculation, as the data is not available well in advance, and transfer of resulting data back. As the source data is in this case usually not available in advance, it is hard to schedule data movements. Furthermore, the data movements in this model require low-latency transfer for efficient processing and thus it is impossible to utilize distributed storage infrastructure1 Third problem is that

fine-grained parallelization requires modifications to all source and target codecs in use, which is very hard as it might comprise tens of different algorithms to parallelize.

1Theoretically, it would be possible to utilize some highly experimental and not very affordable storage sys-

We have opted for the latter approach for the following reasons: since the asynchronous processing relaxes latency constraint, we may assume that the source data is completely acquired before the processing, and also because our target is to build a system that works faster than real-time and we can suppose whole material is available in advance anyway. Compared to parallelization on compression algorithm level, the parallelization on data level is codec-independent and thus the same architecture and implementation can be used for many input/output formats. Furthermore, it is possible to use it with target formats for which there is no open-source codec implementation and the only condition is that there is an efficient way for merging resulting chunks together.

The proposed workflow for the distribution of the processing (Figure 8.1) looks as fol- lows: the source data is split into chunks which are then encoded in parallel and the re- sulting data is merged back into the target data. The goal is then tominimize completion time of the last finishing job of the parallel phase. Although we have relaxed latency require- ments posed on the asynchronous processing and initial and final phases count together to overall latency of the processing, we require that these two phases should be much faster than the parallel phase, thus making the processing effective from the real-user point of view. The source chunks for the parallel phase are stored in distributed storage (possibly in multiple copies for performance and reliability purposes) to be effectively accessible by distributed processing nodes.

Source Data Data Chunking Chunk Merging Chunk Processing Chunk Processing Chunk Processing Chunk Processing

FIGURE8.1: Workflow in the Distributed Encoding Environment model of

processing distribution.

As the source data is complete before the processing, we may split the parallel pro- cessing into uniform chunks, which makes it possible to create a scheduling algorithm belonging to PO class as shown in Section 8.2.

8.1.1 Conventions Used

The overview of the infrastructure model used throughout this chapter is given in Fig- ure 8.2, comprising data sources, storage depots, processing nodes and the network infras-

8.1. MODEL OF DISTRIBUTED PROCESSING 58

processing node data source

router/switch

storage depot

FIGURE8.2: Model of target infrastructure.

tructure with links and active elements (routers/switches). In order to maintain consistent notation throughout this chapter, we also define a number of symbols below.

Definition 8.1 (Data transcoding) Transformation of (multimedia) data from a source for-

mat to a target format is called transcoding. 2

Definition 8.2 (Data prefetch) Data prefetch2 is an act of moving data closer to the pro-

cessing infrastructure during the time period between the job is scheduled and the job is

run. 2

There is also a number of symbols and variables used, some of which are also provided with deeper explanation where appropriate:

t time t0 now

p processing node P set of processing nodes

d the depot where the data to be processed are stored

Du set of depots scheduled/used for actu-

ally accessing the data to be processed in task u

D set of depots that store data to be pro- cessed (all depots unless indicated otherwise)

u (type of) processing task

U set of processing tasks (all the tasks have the same length)

U set of tasks scheduled to processor p lu length of processing task u; units [Mb]

2In Grid community, ”data stage-in” term is often used as an equivalent to data prefetch.

3This information can be theoretically obtained from most of current advanced schedulers. However there

are a few issues that make it partially theoretical functionality only:

tsched_free

p information from job scheduler

in what time the processor p will be available3; units [s]

sp,u processing performance4of processor p

on (type of) task u; unitsMb . s−1

s0

p,u resulting material production perfor-

mance of processor p on (type of) task u; unitsMb . s−1

bD,p(t) download capacity (bandwidth)

from depot set D to processor p in time t as discussed in Section 8.2.2; units

Mb . s−1

bp,D(t) upload capacity (bandwidth) from

processor p to depot set D in time t as discussed in Section 8.2.2; units 

Mb . s−1