• No results found

The Future: System Design with Customizable Architectures,

Software, and Tools . . . 62

Review Questions . . . 63

Bibliography . . . 63

2.1

Introduction

Embedded system development seeks ever more efficient processors and new automation methodologies to match the increasingly complex requirements of modern embedded applications. Increasing effort is invested to accelerate embedded processor architecture exploration and implementation and opti- mization of software applications running on the target architecture. Special purpose devices often require application-specific hardware design so as to meet tight cost, performance and power constraints. However, flexibility is equally important to efficiency: it allows embedded system designs to be eas- ily modified or enhanced in response to evolution of standards, market shifts, or even user requirements, and this change may happen during the design cycle and even after production. Hence the various implementation alterna- tives for a given function, ranging from custom-designed hardware to software running on embedded processors, provide a system designer with differing de- grees of efficiency and flexibility. Often, these two are conflicting design goals, and while efficiency is obtained through custom hardwired implementations, flexibility is best provided through programmable implementations.

Unfortunately, even with sophisticated design methodologies and tools, the high cost of hardware design limits the rapid development of application specific solutions and the actual amount of architectural exploration which can be done. Taking new, emerging technologies and putting them on silicon is a great challenge. The complexity is becoming so demanding that the integration and verification of hardware and software components require increasingly more time, thus causing delays to bringing new chips to market.

Recent advances in processor synthesis technology can reduce the time and cost of creating application-specific processing elements. This enables a much more software-centric development approach. A greater percentage of software development can occur up front, and architectures can be better optimized from real software workloads. Application-specific processors can be synthesized to meet the performance needs of functional subsystems while maximizing the programmability of the final system. Essentially, the hardware is adapted to software rather than the other way around.

Configurable processing combines elements from both traditional hard- ware and software development approaches by incorporating customized and application-specific compute resources into the processor’s architecture. These compute resources become additional functional engines or accelerators that are accessible to the designer through custom instructions. Configurable pro- cessors offer significant performance gains by exploiting data parallelism through wide paths to memory: operator specialization such as bit width op- timization, constant folding and partial evaluation; and temporal parallelism through the use of deep pipelines.

In general, in designing an embedded system-on-chip (SoC) three ap- proaches are historically followed. The first is a purely software-centric ap- proach by mapping of applications to a system-on-chip or multiprocessor SoC (MPSoC) and optimizing them for enhanced performance or for power con- sumption or real-time response. Using advanced compiler technology often system designers can leverage the knowledge of how to squeeze the ultimate performance out of a specified architecture. Although the C language widely used in developing embedded applications does not support parallelism, par- allelizing compilers can give significant advantage to exploit MPSoC architec- tures. It is even possible for compiler technology to recognize and vectorize data arrays that can be handled through the SIMD (single instruction multiple data) memory-to-memory architectures of certain SoCs.

The second approach is design of application-specific hardware to achieve high-speed embedded systems with varying levels of programmability. Al- though application-specific integrated circuits (ASICs) have much higher per- formance and lower power consumption, they are not flexible and involve an expensive and time-consuming design process. Finally, the third recently appeared approach is the development of both the hardware and software ar- chitecture of a system in parallel, so as to enhance the flexibility of ASICs for a specific problem domain. Though not as effective as ASICs, custom- instruction processors are emerging as a promisingly effective solution in the hardware/software codesign of embedded systems. The recent emergence of configurable and extensible processors is associated with a favorable trade- off between efficiency and flexibility, while keeping design turn-around times shorter than fully custom designs.

Application-specific integrated processors (ASIPs) fill the architectural spectrum between general-purpose programmable processors and dedicated hardware or ASIC cores(as depicted in Figure 2.1). They allow one to effec- tively combine the best of both worlds, i.e., high flexibility through software programmability and high performance (high throughput and low power con- sumption).

The key to customization of an embedded system architecture is the abil- ity to expand the core processor instruction set, and possibly the register files and execution pipelines. Since the application developers in addition to devel- oping the application must also tailor the embedded system and discover the critical processor hotspots for the specific application, it is crucial to use an

ASIP RASIP DSP GP GPU FPGA Flexibility

Performance Power Efficiency GPP

ASIC

FIGURE 2.1: Different technologies in the era of designing embedded system- on-chip. Application-specific integrated processors (ASIPs) and reconfigurable ASIPs combine both the flexibility of general purpose computing with the efficiency in performance, power and cost of ASICs.

automated framework. Hence, it has become increasingly important to provide also automated software support for extending the processor features. Given a source application, researchers aim at providing a compiler/synthesis tool for a customizable SoC that alone can generate the best cost-efficient processing SoC along with the software tools.

2.2

Challenges and Opportunities: Programmability or

Customization

The multi-core revolution has shifted from a hardware challenge (making sys- tems run faster with faster clock cycles) to a software challenge (utilizing the raw computation power provided by the additional cores). Embedded appli- cation developers today have more resources at their disposal and have to use concurrent programming techniques to exploit them, making the development and deployment of the applications more challenging. Several parallel pro- gramming models do exist: openMP, message passing interface (MPI), POSIX threads, or hybrid combinations of these three. The selection of the most ap- propriate model in the context of a given embedded application requires ex- pertise and good command of each model, given the complexity imposed by cores competing for network bandwidth or memory resources. Moreover, in one direction, embedded platform providers offer sets of tools and libraries that bring simplicity to multi-core programming and help programmers har-

ness the full potential of their processors. Usually, these involve support for C/C++, standard programming paradigms, and the most advanced multi-core debugging and optimization tools.

Recently, design methodologies for managing exploding complexity con- sider embedded software from the early stages. Embedded systems are inher- ently application-specific. While system designers have to traverse the complex path involving different technologies and evolving standards, the success de- pends on timely reaction to market shifts and minimizing the time to market. Thus, advanced multiprocessor architectures on a single chip are built that mainly rely on programming models (streaming, multi-threading) to support efficiently embedded applications. However, in a different perspective, devel- oping strategies these days employ software in the design and manufacturing process in a different way. Some strategies attempt to tailor the hardware more on the specific domain problem than the other way around.

An embedded system runs one specific application throughout its life- time. This gives to the designers the opportunity to explore customized ar- chitectures for an embedded application. The customization can take many forms: extending the instruction-set architecture (ISA) of the processor with application-specific custom instructions, adding a reconfigurable co-processor to the processor, and configuring various parameters of the underlying micro- architecture such as cache size, register file size, etc. However, given the short time-to-market constraint for embedded systems, this customization should be automatic. Modern techniques face a shift from retargetable compiler tech- nologies to a complete infrastructure for fast and efficient design space explo- ration of various architectural alternatives.

Although programmability allows changes to the implemented algorithm achieving the requirements of the application, customization allows to spe- cialize the embedded system-on-chip in a way that performance and cost con- straints are satisfied for a particular application domain.

2.2.1

Objectives

All research and industrial approaches fundamentally aim to partition an ap- plication to core-processor functions and custom functions that are located on the critical execution path. Under certain system constraints (such as area cost, power consumption, schedule length, reliability, etc.) these custom func- tions are efficiently implemented in hardware to augment the baseline pro- cessor. Emerging standards and competitive markets though, stress for more flexible and scalable SoCs than customized hardware solutions.

For embedded SoC developers the objectives are to efficiently explore the huge design space and to combine automatic hardware selection and seamless compiler exploitation of the custom functions. By carefully selecting the custo-

mizable functions these can often be generalized to make their use have ap- plicability across a set of applications. This is due to the fact that the compu- tationally intensive portions of applications from the same domain are often similar in structure.

The system designer must be provided with an efficiently automated de- velopment environment. This environment can integrate compiler technology and software profiling with a synthesis methodology. Using an analytical ap- proach or benchmark and a simulation methodology significantly enhances an automated environment. The interworking of all these technologies must assist in realistically tuning a multiprocessing SoC to fit a specific application.

Benchmarks Analytical Models Tools / Methodologies

Scalability

Power−Performance product Cost (complexity, silicon area) RealTime performance/response Energy consumption

Criteria / Metrics

Automization, Transparency General purpose computing

Verification effort Flexibility

Fault tolerance, robustness Coprocessor Acceleration

Processor extensions Instruction Set Customization

Application Parallelization for MPSoC

Embedded System Enhancements

H/W codesign S/W

Architecture Description Languages Reconfigurable Computing Efficiency Programmability Scheduling Multi−threading model Synthesis Task−level Optimizations Compilers Profilers Simulators

FIGURE 2.2: Optimizing embedded systems-on-chips involves a wide spec- trum of techniques. Balancing across often conflicting goals is a challenging task determined mainly by the designer’s expertise rather than the properties of the embedded application.

The proliferation of multimillion gate chips and powerful design tools have paved the way for new paradigms. Network-on-chip architecture provides a scalable and more efficient on-chip communication infrastructure for complex systems-on-chips (SoCs). NoC solutions are increasingly used to manage the variety of design elements and intellectual property (IP) blocks required in to- day’s complex SoCs. NoC-based multiprocessor SoCs (MPSoCs) have emerged with a significant impact on the way to develop embedded applications. ASIPs, NoCs, and MPSoCs make the application-specific hardware-software codesign spectrum even wider as discussed in the following sections.

2.3

Categorization

2.3.1

Categorization of Customized

Application-Specific

Processor Techniques

Different mechanisms to configure and adapt a base system-on-chip (SoC) architecture to specific application requirements have been researched, usu- ally along with a complete design tool and exploration environment. They range from component-based construction of embedded systems, with the aid of architecture description languages or instruction set extensions of a base processor and from design time application specific customization, to run- time system reconfiguration. Extensible processing combines elements from both traditional hardware and software development approaches to provide customized per-application compute resources in the form of additional func- tional engines or accelerators which are accessible to the designer through custom instructions.

Initial strategic decisions on developing an enhanced embedded SoC (tar- geting flexibility, i.e., not following an ASIC approach) can be classified as follows.

• Single processor, extensible either in the form of its instruction set, or configurable by parameterizing the integrated hardware resources (mul- tipliers, floating-point, DSP units, etc.), or with coprocessors.

• Symmetric multiprocessor SoC (MPSoC). Partitioning and mapping of the embedded application to the processors can be done at compile time at the task or basic block level. Alternatively, the developer can provide hooks to the operating system to schedule tasks on the processors at runtime.

• Heterogeneous single-chip MPSoC, or asymmetric multiprocessing that features integration of multiple types of CPUs, irregular memory hier- archies, and irregular communication. Heterogeneous MPSoCs are dif- ferent from traditional embedded systems due to complexity and het- erogeneity of the system that significantly increase the complexity of the HW/SW partitioning problem. Meanwhile, evaluating the perfor- mance and verifying its correctness are much more difficult compared to traditional single processor-based embedded systems. Programming a heterogeneous MPSoC is another challenge to be faced. This prob- lem arises simply because there are multiple programmable processing elements. Since these elements are heterogeneous, the software designer needs to have expertise on all of these processing elements and needs to take a lot of care on how to make the software run as a whole.

• HW/SW codesign with a combination of the above architectural so- lutions. Hardware/software partitioning is usually a coarse-grain ap- proach, while custom instruction sets find speedups at finer levels of granularity. Traditionally architecture description languages (ADLs) have been utilized to this direction.

• Network-on-Chip based multi-core SoCs. Given the aggregate demands of multi-core architectures, tools are emerging to help chip architects ex- plore new interconnect topologies and perform application-specific anal- yses. Thus, it is feasible to optimize on-chip communications (bandwidth and latency) between IP cores, along with overall system characteristics such as power, die area, system-level performance, timing closure and time-to-market.

• Hardware synthesis from high-level languages. This is a concept that continues to gain momentum in the electronic design automation com- munity. Originating from an academic project (PACT) at Northwestern University a path from the MATLAB language to an implementationR on a heterogeneous embedded computing platform is provided, which later commercialized into the AccelChip MATLAB to RTL VHDL tools targeting FPGAs [6]. Gupta et al. [24] present a framework that treats behavioral descriptions in ANSI-C and generates synthesizable register- transfer level VHDL; emphasis is placed on effectively extracting par- allelism for performance. The PACT HDL also is an attempt that con- verts C programs to synthesizable hardware descriptions targeting both FPGAs and ASICs, optimizing for both power and performance [38]. Catapult C, Handel C and Impulse C are recent products from various EDA companies that synthesize algorithms written in C/C++ directly into hardware descriptions.

Equally important as performance, power and cost is the time-to-market demand, which leads to systems (Tensilica Xtensa [63], ARC 700 [37], MIPS Pro Series [35], Stretch S6000 [36], Altera Nios II [53], Xilinx MicroBlaze [54]) that come with a pre-designed and pre-verified base architecture and an extensible instruction set. The pre-designed and verified base architectures reduce the design effort considerably, and the programmable nature of such processors ensures high flexibility.

The effectiveness of configurable and extensible processors has been demonstrated both for the early single chip processors and for the recent MPSoCs. The main techniques to application-oriented customized processing can be broadly outlined as:

• Extend the instruction-set architecture (ISA) of the processor with application-specific custom instructions

• Configure base processor core with functional engines or attach copro- cessor accelerators (maybe using reconfigurable technology)

• Customize memory subsystem (followed with customized load/store se- mantics)

• Customize various parameters of the resources of the base architecture (cache size, register files, etc.)

• Off-load, use loosely coupled flexible I/O processing.

The methodologies to apply processor configurability and extensibility are various and in principle follow the directions:

• Processor customization. coarse-grain at block level, by integrating processing units with a CPU, or fine-grain, by customizing the instruc- tion set. Customization can be applied on single embedded processor or in the context of homogeneous or heterogeneous multiprocessor archi- tectures.

• Reconfigurable computing approach. Use a baseline processor with reconfigurable logic, soft or configurable processors; in addition a few approaches allow run-time reconfiguration.

• Reverse customization. Executable code to coprocessor generation: free from source level partitioning and independent from the origin of the source (i.e., multiple source languages can be used), ASIPs are imple- mented directly from an executable binary targeted at the main pro- cessor. The executable code may be translated into a very different application-specific instruction set that is created for each coprocessor. The generated coprocessors range from fixed function hardware acceler- ators to programmable ASIPs.

• Hardware Architecture Description Languages (ADL). ADLs enable embedded processor designers to efficiently explore the design space by modelling their processor using a high level language, and au- tomatically generate instruction set simulators (ISSs) and a complete set of associated software tools including the associated C compiler. Custom processors, such as application-specific instruction processors (ASIPs) for DSP and control applications, are also featured by the au- tomatic generation of synthesizeable register transfer level (RTL) code. Depending on the abstraction level different ADLs have been designed for hardware-software codesign:

⋄ High-level ADL, an attribute grammar-based language is used for processor specification and a synthesis tool next generates struc- tural synthesizable VHDL/Verilog code for the underlying archi- tecture from the specifications. nML ([20], sim-nML([47], [8] and ISDL ([26], FlexWare [51]) belong in this class.

⋄ Low-level ADL, MIMOLA [67] hardware specification language enables the designer to write structural specification of a pro- grammable processor at low level, exposing several hardware de- tails.

⋄ Complete ADL, both the processor behavior at the instruction level can be described to tailor to the application needs and the architec- ture design space exploration can be managed via integrated soft- ware toolchains and architecture implementation and verification toolchains. LISA [31] is an example of this integrated development environment.

ADL-based methodologies usually offer the maximum flexibility and effi- ciency at the expense of increased design time and significant effort. Mean- while, working with pre-designed and pre-verified cores (e.g., Tensilica Xtensa, ARC Tangent, MIPS CorExtend) offers faster timing closure.

The above classification is not very sharp for various reasons. Increasingly, programmable platforms are available with hybrids of the above forms of pro- grammability available in the form of processors and programmable hardware on the same die. Further, the distinction between instruction and hardware programming bits is gradually becoming blurred.

In traditional hardware/software co-synthesis the custom hardware is in the form of predefined hardware computation elements (CEs) that reside in libraries. The outcome of the synthesis flow is principally a processor with a set of CEs permanently bound to it so as to accelerate a fixed assigned task. This is depicted in Figure 2.3 (a) in the shaded part, which may include blocks to assist in DSP computations for example. In a different or complementary approach, design space exploration tools assist in defining the most efficient topology to interconnect an amount of pre-designed and verified computation or interface components with one or more fixed CPUs(Figure 2.3 (b)).

Nowadays, heterogeneous or asymmetric multiprocessing is the most ef- fective and competitive in the cost-conscious embedded SoC market segment. Adoption of embedded SMP is limited mostly because of the immature level of SMP support of embedded OSs and compilation toolchains. For example, general-purpose processors (GPPs) and DSPs have distinctively different char- acteristics that make them best suited to different application domains. Thus,

Related documents