• No results found

Distributed processing of current high-end multimedia data streams poses many interest- ing challenges to tackle, as it can easily comprise multiple-gigabits per second data flows. Depending on whether minimum latency is required by the target application, we can distinguish two classes of problems:

synchronous or interactive (real-time, on-line, etc.), which require as low latency of the processing and the distribution as possible, like virtual collaborative environments (videoconferencing, teleconsulting),

asynchronous or non-interactive (off-line), where latency limitation is not that strict, like streaming, digital broadcasting, and most of other uni-directional transmissions. While in the former case, the processing is usually limited to some extent and scala- bility of the target environment is assessed according to number of clients supported, or bandwidth that can be processed and distributed, the latter case may involve much more sophisticated processing. Each of these cases poses its own set of problems and thus they were studied separately in this work.

10.1.1 Synchronous processing

Traditional synchronous distribution infrastructure in IP networks has been based on mul- ticast with its theoretically optimum scalability. However, when deploying multicast, the users are left at administrators disposal, whether the multicast works in given network or not and thus the multicast solution is not user-empowered. Furthermore the multicast is only about distributing the data over the network and it doesn’t support processing of the data inside the network.

Thus we have previously come up with the user-empowered approach based on user- empowered modular reflector [32], which can be run on a computer inside the network by any user whether he/she has administrative privileges or not. It supports not only data distribution, but also processing inside the distribution infrastructure and even user- specific processing. However, its centralistic architecture sacrifices scalability and robust- ness and it is only suitable for supporting small user groups that utilize small- to medium- bandwidth streams and its users must not expect high degree of robustness with respect to network and distribution infrastructure outages.

10.1. SUMMARY AND DISCUSSION 80 In this thesis, we have generalized the concept of the user-empowered modular re- flector to the Active Element (AE) [40]. It supports both running distribution networks suitable for larger groups and also distributing the AE itself over computational clusters to provide scalability with respect to individual stream bandwidth that may be beyond capacity of any single computer/processor.

AE Networks. The AE comprises the same basic modules as the reflector (messaging in- terfaces, listener and sender modules, kernel with administrative functionality like AAA, resource scheduling and limiting, and processors with capability of processing data flow- ing through the AE) and it extends the architecture with two new modules: network man- agement and network information modules. The former module is used for creating and managing AE networks and the latter provides information on both individual AE and also network of the AEs as a whole. In order to retain the user-empowered principle of the operation, the AE network control layer is designed to run in peer-to-peer mode.

We have studied a number of distribution models suitable for data distribution inside the AE networks, ranging from simple 2D full-mesh and layered 3D models through mul- tiple spanning trees models. All of them were evaluated both in terms of scalability with respect to number of supported clients and also in terms of robustness of operation with respect to AE failure and network disintegration.

We have shown that the AE networks are suitable solution for building scalable, robust, user-empowered overlay networks for synchronous multimedia distribution with accept- able latency for larger groups of users. However, the networks of AEs don’t improve scalability with respect to bandwidth of individual stream and thus it might be necessary to deploy distributed AEs when bandwidth of individual stream is beyond capacity of individual AE.

Distributed AE. When parallelizing individual AE itself, the distributed AE comprises multiple equivalent AE units running in parallel and we have extended the model with two other parts, which are not necessarily part of the distributed AE itself: data distribu- tion unit which distributes data over the multiple parallel paths in the distributed AE, and data aggregation unit, which aggregates resulting data from the parallel paths into one or more output network links. The reason for keeping distribution and aggregating units generally independent of distributed AE is that it is usually required for distribution and aggregation units to support higher-bandwidth streams compared the individual parallel paths of the distributed AE and thus it may be implemented as a separate (e. g. hardware) unit.

Distributed AE brings a new problem inherent to the fact that the data are flowing through multiple independent paths—packet reordering. It results either into packet loss (if delayed out-of-order packets are just discarded) or indirectly into latency increase as the application needs to buffer the data in order sort packets before actual processing begins. For cases where better than no explicit sending synchronization is needed to minimize output packet reordering induced by the distributed AE, we have designed and evaluated Fast Circulating Token protocol providing limited synchronization among sender modules of distributed AE parallel paths. While distributed AE with no explicit sending synchro- nization provides limited reordering, we have shown both theoretically and experimen- tally that FCT decreases maximum egress packet reordering, sometimes even more than two orders of magnitude.

Pilot Applications. In this thesis, we have demonstrated both medium-bandwidth pilot applications suitable for both AE networks, like DV and HDV over IP, and high-bandwidth applications for distributed AEs, like uncompressed HD video over IP. We have shown that performance and latency of the two modes of AE distribution are sufficient the two respec- tive pilot applications classes. For example, we have demonstrated that the distributed AE

is suitable for distribution of uncompressed HD video even using purely software imple- mentation.

10.1.2 Asynchronous processing

Distributed Encoding Environment. For distributed asynchronous processing, the re- quirements on latency is far from the requirements of the synchronous processing and it turns out that it brings a different set of problems compared to synchronous process- ing. Compared to synchronous processing, asynchronous one usually involves much more complex processing and data transformations, often resulting in multiple different target media.

We have designed efficient distributed model for distributing asynchronous process- ing [39] that is capable of even very complex processing in real-time (meaning that the parallel processing doesn’t take more time than footage of the source material—it doesn’t include initial latency) or faster, depending on degree of parallelism involved. The maxi- mum parallelism is bounded by the minimum atomic unit that the source/target data can be split into. The model is based on creating jobs with uniform size for distributed com- puting nodes without shared memory as available in Grid environments and distributed storage infrastructure used as transient data storage of source (and possibly also target) data.

We have analyzed scheduling problems of this environment and found out that while the most common processor scheduling involving non-uniform jobs scheduled to uni- form processors belongs to NPO-class, our problem with uniform jobs scheduled to non- uniform processors turns out to be PO-class problem. When the distributed storage is con- nected with computing infrastructure with complete graph, which is reasonable model of current high-speed networks, the problem of scheduling tasks to depots also belongs to PO-class and thus scheduling as a whole belongs to PO-class under above mentioned conditions. We have also studied network resources scheduling and created a model of Network Traffic Prediction Service, that provides all information needed for correct job scheduling.

In order to validate models mentioned above, a prototype implementation—called Dis- tributed Encoding Environment—based on Grid computing infrastructure and Internet Backplane Protocol distributed storage infrastructure has been implemented. It interacts with the site-local scheduling via PBS scheduler interface and it can use IBP-enabled video processing open-source applications as well as closed source application with external IBP support. The prototype confirms expected behavior and provides expected performance. Pilot Applications. Distributed Encoding Environment has become used routinely by its pilot applications, most notably processing of lecture archives recordings, which provides multi-terabyte archives of video material for educational purposes.