Software architecture for AMP - X-RT: A portable framework for real-time scheduling

3. X-RT: A portable framework for real-time scheduling

3.5. Software architecture for AMP

SMP has become a standard de-facto for many areas in the real-time domain, such as process control and industrial automation, due to the high flexibility offered by its uniform programming model.

However, it has a major drawback in terms of hardware requirements. Some dedicated hardware resources, in fact, are required to support the interlocked operations, cache coherence and other mechanisms typical of SMP systems. Such hardware resources, however, can prove very costly when it comes to small-scale and low-power embedded system due to both area and power requirements.

For such reasons, in many small-scale and highly-integrated embedded real-time systems, AMP reveals to be the leading choice, despite its more complex and less flexible computational model.

Furthermore, it has to be said that, conversely to SMP, where the multiprocessing model is somehow uniform and independent of the specific architecture that implements it (for instance, the software programming model of x86/amd64 multicore processors is not that different from the one of multicore SPARC or PowerPCs), AMP usually doesn’t identify a precise model, rather a variety of different and application-specific architectures that often reflect into very different software programming models.

The common principles that typically hold among all the different AMP architectures are mostly related the resulting software organization, which consists in several distinct operating system instances running independently on each processor.

Reference architecture

Before extending the design considerations made so far for SMP to AMP systems, it is mandatory to make some more detailed assumptions about the underlying AMP architecture that is going to be tackled by the X-RT framework.

3.5 Software architecture for AMP

Considering that the main target of this work has been represented mostly by industrial embedded real-time systems, the choice fell on the area of multiprocessor systems on programmable chip (MP-SoPCs).

Originally exploited as prototyping platforms for later implementation in ASIC, FPGAs have become feasible vehicles for final designs, enabling an agile integration of manifold hardware resources suitably interconnected via a customizable bus, as general-purpose processors (soft-cores), memory and peripheral devices. Currently available design tools leave high degrees of freedom to the designer, particularly as regards the inter- processor communication infrastructure and the memory layout. Customization options typically involve not only the choice of the memory technology, which can range from fast on-chip memory to external DRAM solutions, but also the interconnection topology, allowing to tightly couple a memory to a single core, avoiding any contention, or share it, sacrificing access times in favor of resource saving [ANJ+2004, TAK2006].

However, due to the intrinsic nature of FPGAs, the computational performances of soft-cores can often result limited (though still remarkable) if compared to specialized hardware such as modern PC processors, having execution rates ranging within the order of hundreds of MHz. On the other hand, the extreme flexibility of modern SoPC platforms allows many soft-cores to be instantiated on a single FPGA, in order to enhance, accordingly to the user needs, the computational power of the resulting platform by means of multicore hardware parallelism.

Nevertheless, exploiting such multicore platforms in order to concretely take advantage of the hardware parallelism is, in general, a non-trivial task that requires particular care in both the software design and development stages. The complexity level, in fact, is far beyond the traditional uniprocessor scenarios, which are undoubtedly more familiar to the most embedded application developers. In this sense, if we narrow the focus to more specific and constrained scenarios, such as the embedded real-time one, the introduction of an ad-hoc infrastructure which properly masquerades the underlying complexities can make such SoPC multicore platforms feasible and reliable solutions, for instance as in the cases of [BBCG2008, XWB2007].

More technical specifications about the soft-core architecture used for the evaluation of this work are deferred to the performance evaluations section. For the moment, the only architectural assumptions made for AMP are: (i) the availability of a shared memory only for holding the working sets of real-time tasks; (ii) a hardware inter-processor communication mechanism that allow the software instances running on the several processors to exchange messages.

The AMP architecture, furthermore, imposes some more restrictions on the scheduling algorithm that can be employed. In particular, conversely to what happens in SMP systems, process migrations cannot be performed. For such reason, for the AMP we are considering the restricted-migration variant of the G-EDF scheduling policy, that is R-EDF [BC2003].

Design of the AMP version of the X-RT framework.

The fundamental contribution brought in by the X-RT framework in the AMP case is represented by the underlying run-time model, herein called

shadow process model. Its purpose is realize a 1-to-m mapping of periodic

real-time tasks onto processes of the m RTOS instances and manage their execution flow in a centralized manner, according to the global decisions of the scheduling plug-in, realizing a task-level migration abstraction even if RTOS processes cannot be really migrated.

From the software standpoint, each shadow process consists in an instance of the task-shell, which operates in the same way of the SMP version. The main difference is that in this new model, each real-time task is associated to m task shells, one for each processor. As in the SMP case, also in AMP the X-RT framework requires only three priority levels (LOW, MEDIUM, HIGH), which keep the same semantic.

At any time, at most one shadow process is ready for execution (from the RTOS scheduler viewpoint) with MEDIUM priority on each of the m RTOS instances. This shadow process corresponds to the real-time task that is expected to execute on that processor by the infrastructure. Therefore, keeping the assumption that each RTOS scheduler follows a strict priority-driven policy, no ambiguity can exist as regards the overall set of tasks running on the system at any time.

3.5 Software architecture for AMP

AMP compliance is ensured since the restricted migration model guarantees that pre-empted tasks cannot be resumed on any processor other than the one where the job execution started, therefore no migration of process context is required. The only state that the infrastructure should care about and keep coherent is the working-set of the tasks, which might be accessed (at different times) by distinct jobs of the same task on different processors. This latter point will be further discussed later in 3.6. The X-RT framework is deployed in a distributed fashion on AMP: the metascheduler, together with the scheduling policy plug-in, is executed exclusively on one of the m processors (which can be used as well for the scheduling of real-time tasks). The scheduling policy plug-in takes global scheduling decisions also in the AMP version of X-RT (with respect of the additional restricted-migration constraint, though). The interface between the metascheduler and the policy plug-in remains unchanged, thus the plug-in remains completely unaware of the underlying process model and SMP/AMP architecture.

The task-shells, conversely, are distributed on the m processors. Their operating principle, however, is the same. In this sense, the decoupled architecture of the framework, and in particular the message passing strategy employed to coordinate the metascheduler and the task-shells, shows its best advantages in the AMP scenario, where a coordination based on traditional shared-memory patterns would be completely unfeasible. An extra component, however, is required to put the shadow process model in operation. Due to the inherent run-time isolation between the RTOS instances, the metascheduler is not capable anymore of directly instantiating the wrapper processes upon instantiation of new real-time tasks. For such reason, a further component, the dispatcher(s), is introduced. The framework requires that m dispatchers must be pre-loaded on each processor after the initialization of each RTOS instance (Figure 33). From the runtime viewpoint, each dispatcher acts as a local proxy for the centralized metascheduler. The interaction between the metascheduler and the dispatchers is, once again, realized by means of (inter-processor) message passing.

In this regard, a new message (MSG_CREATE_SHADOW_PROCESS) is envisaged. Such message is sent by the metascheduler to the dispatchers when a new periodic task is requested to the metascheduler through the

XRT_CreateNewPeriodicTask method of the X-RT API.

Figure 33: Software and memory organization of the X-RT framework on AMP.

RTOS Dispatcher Task n body RTOS RTOS Dispatcher Dispatcher Task 1 body Task Shell Task Shell Task n body Task 1 body Task Shell Task Shell

System Abstraction Layer System Abstraction Layer System Abstraction Layer

Metascheduler Task n body Task 1 body Task Shell Task Shell

Processor 1 Processor 2 Processor m

Shared memory

RTOS memory

(code + data + stack)

Task 1 (code + stack)

X-RT Library

(code + data + stack)

Processor m Private memory Processor 1

Private memory

RTOS memory

(code + data + stack) X-RT Library

(code + data + stack) Task 1 (stack) Task n (stack) MS (stack) Task n (code + stack) RTOS memory

(code + data + stack)

Task 1 (code + stack)

X-RT Library

(code + data + stack) Task n (code + stack) Processor 2

Private memory

Task 1 working set (data)

Task 2 working set (data)

Task n working set (data)

In document Hardware/Software Design of Dynamic Real-Time Schedulers for Embedded Multiprocessor Systems (Page 103-108)