The special hardware features in OSIP improve the OSIP efficiency, but at the cost of the reduced flexibility. Among these hardware features, the OSIP-AGU and the node comparator are the most prominent ones, which are discussed in this section.
3.5.1
OSIP Address Generation Unit (OSIP-AGU)
The OSIP-AGU, which introduces little hardware overhead, greatly speeds up the memory accesses to the OSIP_DTs. This is thanks to the regular structure of the OSIP_DTs with eight words and their arrangement in a static array, for which a simple address generation using a shift operation by a constant and basic logic operations instead of arithmetic operations becomes possible. However, this can be a limitation in the programming, if over eight words are needed for a node.
Normally, eight words are already large enough for storing the necessary informa- tion in a node, whether for a task node, a scheduling/mapping node or for a PE node. In fact, there are still fields in the nodes, which are not used, but reserved for future extensions. In case that more words are really needed for storing the information, a workaround has to be done. For example, the node information can be distributed in two consecutive OSIP_DTs, so that the OSIP-AGU is still applicable. Of course, necessary information conversions between the node and the OSIP_DT structure for the index and word offset must be made in the OSIP software. If the number of the needed words for the node is not a multiple of eight, some words get wasted.
On the other hand, if the designer wants to reduce the memory usage by reducing the node size, in case that less information is needed for a node, it is not possible with the OSIP-AGU. This also means wasting memory words.
A more generic way of implementing the OSIP-AGU would be using a hardware multiplier, which multiplies the node index by the node size (i.e., the number of words per node) to calculate the base address of a node. Then, the word offset is added with the base address to obtain the final word address. In comparison with the current OSIP-AGU, this implementation would result in a larger area in the pipeline
and possibly worsen the timing. However, for certain applications, it could enable better memory utilization and also has higher flexibility. Hence, a trade-off can be considered.
3.5.2
Node Comparator
The node comparator is another hardware feature, which can have flexibility limita- tions when using the cmp_node and cmp_node_e instructions. Naturally, it is impossible for a hardware node comparator to cover all possible comparison rules and different combinations, and the current comparator already supports a quite wide range of rules. However, if new rules should be applied, software implementation is needed in the OSIP scheduling algorithms to support them. To still be able to use these two special instructions for node comparison, an additional flag can be introduced in the software to distinguish the currently supported rules and the new ones.
Another approach of implementing a flexible comparator, which at the same time can cover an even larger range of rules in hardware, is using a Coarse-Grained Re- configurable Architecture (CGRA) [39, 76, 129, 172], in which different rules can be configured statically or dynamically.
System-Level Analysis of OSIP
Efficiency
As a central task manager, the efficiency of OSIP undoubtedly has a big impact on the system performance. The previous chapter has made a preliminary analysis of the OSIP efficiency by analyzing the execution time of the most critical OSIP commands for scheduling and mapping tasks. In comparison to a RISC-based task manager, OSIP is able to execute these commands within much less time. However, from the system perspective, this analysis is only isolated and rather one-sided, as it does not show how the OSIP efficiency influences the overall system performance.
For a complete system, its performance depends on many factors, among which the performance of PEs, the task sizes and the communication architecture are es- pecially important in addition to the task manager. These factors need to be jointly investigated in order to analyze the OSIP efficiency in a system context. In this chap- ter, a thorough characterization of the performance of OSIP is provided from the system point of view. A special focus is laid on the joint impact of the communication architecture and the OSIP efficiency, as the communication architecture has become one of the dominant factors for the performance of modern MPSoCs.
This chapter is organized as follows. First, the system setup for the analysis and the benchmarking applications — a synthetic application and a real-life H.264 video decoding application — are introduced. Then, the OSIP-efficiency is analyzed in systems without considering the communication overhead by idealizing the commu- nication architecture. Afterwards, the impact of the communication architecture on the OSIP-based system performance is highlighted. Following this, optimized realis- tic communication architectures are presented. Based on the resulted different com- munication architectures, the impact of the OSIP efficiency and the communication architecture on the system performance is jointly investigated. Finally, a summary is made for the OSIP efficiency from the system perspective.1
4.1
System Setup
For evaluating the OSIP efficiency at the system-level, a virtual platform is built using Synopsys Platform Architect. The platform consists of several instruction-accurate ARM926EJ-S processor models, an OSIP model, a shared memory and some periph-
1Portions of this chapter have been published by the author in [214] in the International Journal of Embedded and Real-Time Communication Systems, edited by Seppo Virtanen. Copyright 2011, IGI Global,
www.igi-global.com. Posted by permission of the publisher.
ARM0 ARM1 . . . ARMn AHB OSIP Memory Peripherals
Figure 4.1: Setup of baseline system
erals such as input stream, virtual LCD and UART. Without loss of generality, the clock frequency of the system is set to 100 MHz during the analysis.
In the platform, all components are SystemC-based. Three OSIP models – UT- OSIP, LT-OSIP and OSIP, which are introduced in Section 3.2.1, are employed in the system alternatively for comparison. A SystemC wrapper is created for the OSIP models, modeling the behavior of the register interface and interrupt interface.
The communication architecture of the system is AHB-based, to which all sys- tem components are connected. As the arbitration scheme in the bus, round-robin is selected. As will be shown later, the communication architecture is stepwise exten- ded and optimized. A simplified overview of the system is given in Figure 4.1. The different transactors for the communication protocol translation between the system components, as well as the clock and reset generators are not shown in the figure for clarity. In this system, all slave components (OSIP, shared memory and peripherals) of the bus can be accessed by all ARM processors through the bus.