CHIP MULTIPROCESSING - Design and evaluation of a VLIW processor for real-time systems

also remove pipeline dependencies. If we have a five-stage pipeline with five hardware threads, it is possible to remove all forward logic since for each pipeline stage there will be a different thread instruction executing and pipeline dependencies never occur. This is the fundamental idea of the work ofLiu et al. (2012). However, if the number of activated threads is less than the number of hardware threads available, the number of instructions per cycle is severely penalized. Considering a scenario with only one activated thread and five hardware threads, the number of instructions per cycle is only 0.2 since a newer instruction can only be issued after the old one is completed. Memory access for multithreading systems should also be special to remove or decrease thread interference.

Some works likeSchoeberl(2009b) criticize chip multithreading for real-time systems since its analysis is more complex and the repli- cated hardware could be used for chip multiprocessing.

2.8 CHIP MULTIPROCESSING

Most modern computer systems available today use processors with a single chip and multiple cores. Commonly these cores share a single main memory and they are developed with complex and large shared caches. Memory consistency between cores is usually managed by hardware and there are complex internal chip networks to support data transfer and consistency between all cores.

Real-time systems with chip multiprocessing are also viable and this is a tendency as the complexity of these systems grows. Unfor- tunately, the technology employed in general purpose computing does not apply for real-time systems because it increases average-case performance but predictability proprieties are lost. Shared caches and complex hardware networks make the WCET analysis unfeasible. There are several works like Schoeberl et al. (2015) dealing with multipro- cessor designs for real time. In most approaches Timed Division Ac- cess (TDM) protocols are accompanied by real-time networks and data transfer between cores is explicitly managed.

2.9 SUMMARY

In this chapter, we briefly surveyed some relevant processor aspects and their relation to real-time systems. These aspects are important for the subsequent chapters and some of them ensure predictability and performance while others can severely jeopardize WCET analysis capabilities. Table8shows a comparison of techniques used by general purpose processors and their alternative for real-time systems.

Future real-time applications will need performance and a predictable pipeline is necessary to provide instruction temporal parallelism. When using pipelines, it is equally important to explore and optimize their number of stages, hazard resolution and avoid unneces- sary stall cycles to keep the pipeline filled with useful work.

Spatial parallelism is also necessary and it is the natural evo- lution of the standard pipeline. Fetching multiple instructions and executing them out of order when possible is a common technique employed in modern processors. Unfortunately, this leads to time anoma- lies and should be avoided for real-time systems. Spatial parallelism can be achieved by using various simple processors, multiple threads execution or with a VLIW design. The first approach must deal mainly with deterministic inter-processor communications and shared resources like main memory. The second must employ deterministic thread sched- ulers and also deals with shared resources. The last one can achieve high performance with a powerful compiler. In the case of the VLIW approach, it is possible to design systems of multiple threads but chip multiprocessing is more feasible because multithreading support needs critical hardware replication like the register file and this is commonly a very large and critical component in VLIW processors.

2.9. Summary 65

Table 8 – Comparison of techniques for general-purpose and real-time processors

Feature General purpose Real time Pipeline Long with out-of order

execution

Simple with in-order execution

Caches Complex with various levels, shared and complex associativities

Separate caches or scratchpad memory Branch Dynamic predictors Static or non predictors Superscalability Multi-issue and out-of-

order

Various simple cores or in-order multi issue (VLIW)

Chip multithreading

Lots of hardware threads competing for shared resources

Avoid, use multiple cores or threads with interleaved pipeline Chip multipro-

cessors

Shared caches with core inter-interference

Separated caches or scratchpads with predictable core com- munication

3 RELATED WORK

The design of deterministic computer architectures for real-time systems is very relevant and there are several works in this research field. Since high-performance general-purpose processors are not suit- able, it is desired to have new predictable architectures enhancing the WCET analyzability while not ignoring more stringent performance requirements. The main concern is related to WCET analysis.

In this chapter we survey related work on predictable architectures. Compiler techniques, timing analysis techniques and architectural techniques are all important for the predictability of computer architectures. However, we focus mainly on architectural techniques for hard real-time systems only to limit the scope of this survey. We can also note that all the predictability considerations described in Chapter2are considered for all related works.

In document Design and evaluation of a VLIW processor for real-time systems (Page 65-69)