also remove pipeline dependencies. If we have a five-stage pipeline with five hardware threads, it is possible to remove all forward logic since for each pipeline stage there will be a different thread instruction exe- cuting and pipeline dependencies never occur. This is the fundamental idea of the work ofLiu et al. (2012). However, if the number of acti- vated threads is less than the number of hardware threads available, the number of instructions per cycle is severely penalized. Considering a scenario with only one activated thread and five hardware threads, the number of instructions per cycle is only 0.2 since a newer instruc- tion can only be issued after the old one is completed. Memory access for multithreading systems should also be special to remove or decrease thread interference.
Some works likeSchoeberl(2009b) criticize chip multithreading for real-time systems since its analysis is more complex and the repli- cated hardware could be used for chip multiprocessing.
2.8 CHIP MULTIPROCESSING
Most modern computer systems available today use processors with a single chip and multiple cores. Commonly these cores share a single main memory and they are developed with complex and large shared caches. Memory consistency between cores is usually managed by hardware and there are complex internal chip networks to support data transfer and consistency between all cores.
Real-time systems with chip multiprocessing are also viable and this is a tendency as the complexity of these systems grows. Unfor- tunately, the technology employed in general purpose computing does not apply for real-time systems because it increases average-case perfor- mance but predictability proprieties are lost. Shared caches and com- plex hardware networks make the WCET analysis unfeasible. There are several works like Schoeberl et al. (2015) dealing with multipro- cessor designs for real time. In most approaches Timed Division Ac- cess (TDM) protocols are accompanied by real-time networks and data transfer between cores is explicitly managed.
2.9 SUMMARY
In this chapter, we briefly surveyed some relevant processor as- pects and their relation to real-time systems. These aspects are impor- tant for the subsequent chapters and some of them ensure predictability and performance while others can severely jeopardize WCET analysis capabilities. Table8shows a comparison of techniques used by general purpose processors and their alternative for real-time systems.
Future real-time applications will need performance and a pre- dictable pipeline is necessary to provide instruction temporal paral- lelism. When using pipelines, it is equally important to explore and optimize their number of stages, hazard resolution and avoid unneces- sary stall cycles to keep the pipeline filled with useful work.
Spatial parallelism is also necessary and it is the natural evo- lution of the standard pipeline. Fetching multiple instructions and executing them out of order when possible is a common technique em- ployed in modern processors. Unfortunately, this leads to time anoma- lies and should be avoided for real-time systems. Spatial parallelism can be achieved by using various simple processors, multiple threads execu- tion or with a VLIW design. The first approach must deal mainly with deterministic inter-processor communications and shared resources like main memory. The second must employ deterministic thread sched- ulers and also deals with shared resources. The last one can achieve high performance with a powerful compiler. In the case of the VLIW approach, it is possible to design systems of multiple threads but chip multiprocessing is more feasible because multithreading support needs critical hardware replication like the register file and this is commonly a very large and critical component in VLIW processors.
2.9. Summary 65
Table 8 – Comparison of techniques for general-purpose and real-time processors
Feature General purpose Real time Pipeline Long with out-of order
execution
Simple with in-order execution
Caches Complex with various levels, shared and com- plex associativities
Separate caches or scratchpad memory Branch Dynamic predictors Static or non predictors Superscalability Multi-issue and out-of-
order
Various simple cores or in-order multi issue (VLIW)
Chip multi- threading
Lots of hardware threads competing for shared resources
Avoid, use multiple cores or threads with interleaved pipeline Chip multipro-
cessors
Shared caches with core inter-interference
Separated caches or scratchpads with predictable core com- munication
67
3 RELATED WORK
The design of deterministic computer architectures for real-time systems is very relevant and there are several works in this research field. Since high-performance general-purpose processors are not suit- able, it is desired to have new predictable architectures enhancing the WCET analyzability while not ignoring more stringent performance requirements. The main concern is related to WCET analysis.
In this chapter we survey related work on predictable architec- tures. Compiler techniques, timing analysis techniques and architec- tural techniques are all important for the predictability of computer architectures. However, we focus mainly on architectural techniques for hard real-time systems only to limit the scope of this survey. We can also note that all the predictability considerations described in Chapter2are considered for all related works.