• No results found

Application development solutions White paper December A practical approach for developing multicore systems.

N/A
N/A
Protected

Academic year: 2021

Share "Application development solutions White paper December A practical approach for developing multicore systems."

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

A practical approach for developing

multicore systems.

(2)

Contents

2 Introduction

3 Practical and proven approaches are available for multicore development

4 Multicore systems are unique 6 Developing applications for a

multicore smart phone 9 The load balancer application 10 Basic architecture and behavior 12 Using IBM Rational Rhapsody

software and MDD for the load balancer

13 Introduction to protocol stacks and 4G/LTE technologies 15 System-level specification in

SDL

17 Modeling the LTE protocol stack 18 Integration with OSs

18 Model execution and simulation of the LTE device

20 Conclusion

Introduction

Migration into multicore platforms is imminent for a majority of real-time and embedded (RTE) systems, including handheld devices for fourth-generation wireless networks. This presents a challenge to application developers. Today’s RTE systems usually contain multiple processing units, but individual units are typically dedicated to specific tasks rather than being general purpose. For example, in the case of a mobile handset, one unit can be specified for protocol stack handling and others to the user interface.

Other design approaches allocate special tasks to dedicated systems, resulting in unnecessary hardware specification for situations not requiring high pro-cessing load. Flexibility, scalability, power consumption and an optimized user experience will be the main requirements of future RTE systems. Scalability and flexibility are vital to enable fast time to market and allow manufacturers and service providers to be competitive.

Furthermore, power consumption has become an issue in feature-rich phones because the new features lead to higher usage of the phones. This decreases standby time and generates stronger battery constraints and higher heat dissipa-tion. The user experience is associated with the functionalities of the applications and the quality of service (QoS) delivered by the wireless networks, which are affected by the processing units that are typically dedicated to specific tasks rather than being general purpose — specifically for protocol stack handling. Other design approaches allocate special tasks to dedicated systems, resulting in unnec-essary hardware for situations other than high load.

(3)

Highlights Practical and proven approaches are available for multicore developmentExploiting advanced development, microkernel, load-balancing and

virtualiza-tion techniques provides an innovative approach for applicavirtualiza-tions development on multicore hardware platforms. Working together they represent a series of best practices for multicore system development.

Considering the complexity posed by multiple cores, model-driven development (MDD) and platform virtualization techniques allow execution of a broad set of existing applications in new environments by using virtual machines to run complete operating systems (OSs) along with their underlying applications. A case study from our work on fourth-generation/third-generation long-term evolution (4G/3G LTE) wireless handsets demonstrates how these best practices provide a practical approach to development. We’ve been working with the Embedded Multi-Core Processing for Mobile Communication (eMuCo) con-sortium —a European Union–funded multicore research project that includes membership from IBM, Infineon Technologies, ARM and a group of universities across Europe. The lessons we’ve learned from this project are applicable to a wide range of RTE development segments.

The efficacy of the practical approach can be demonstrated with a 4G/3G LTE application.

(4)

Highlights Multicore systems are uniqueFrom an architectural perspective, multicore hardware platforms adopt a

series of configurations, which in most cases are customized to the require-ments posed by each product under development, as displayed in figure 1.

SMP Linux/RTOS/Other AMP Supervisor OS1 OS2 Core 1 Core 1

Core Core ... Core

(a) (b)

The majority of RTE systems use the AMP architecture to support dedicated systems.

Figure 1: Different multicore platform configurations

Seamless integration of diverse software environments

While the majority of enterprise and IT systems use the symmetric multiprocessor (SMP) form, the majority of RTE systems tend to use the asymmetric multiproces-sor (AMP) architecture described above because dedicated systems are required to perform a diverse set of functions ranging from time-critical wireless communica-tion to processor-intensive video streaming. Although the former use case requires a fast, real-time microkernel that’s dedicated to a separate core, displaying video requires many layers of applications such as decoders, display managers and more. The latter use case may favor a sophisticated OS, such as the Linux® platform,

running on one or more dedicated cores. A combination of a microkernel together with a secondary OS, such as the Linux platform, for running applications such as video, load balancer and virtualization engine, provides a spatial and temporal separation of the resources, allowing a seamless and reliable integration of differ-ent software environmdiffer-ents.

(5)

Highlights Concurrent and parallel execution of applications and control signalsMeeting deadlines of real-time transactions, such as shortening the response

time of online transactions, is critical in most RTE systems because it directly influences the user experience. A multicore hardware platform together with a load balancer provides logical simultaneous processing on each core (concur-rency) through priority management and provides physically simultaneous processing (parallelism) through core allocation.

Scalability and flexibility are trends of the future RTE systems

An effective architecture allows scalability in three main areas. For example, with a wireless mobile device, flexibility to grow is needed in the application subsystem, on the modem subsystem and on the number of cores selected in the hardware platform. Scalability in the system’s architecture is essential to address the rapid changes in the mobile marketplace.

Multiple cores are only as effective as the parallel software architecture driving them The software architecture needs to take advantage of the available multicore hardware platform. A model-driven approach becomes an essential part of the multicore development workflow, through trade-off studies —involving model execution on different core combinations —to parallel code generation. MDD becomes a fast and suitable software development framework for scalable and parallel hardware architectures.

MDD provides a fast development framework to support scalability in the mobile marketplace.

(6)

Highlights Developing applications for a multicore smart phoneTo better understand this practical approach for multicore applications

develop-ment, let’s look at the eMuCo smart phone described in figure 2. The proposed multicore hardware platform is effectively exploited by a combination of the L4 microkernel, the load balancer and the virtualization subsystem.

Multitasking application subsystem Multithreaded “modem” subsystem

Load balancer “dynamic task mapping” Basic resource management

L4 microkernel Multicore hardware platform

Adaptability Adaptability User mode System mode App OS 1 OS 2 MM VM Multi-media IP voice 2.5G3G LTE4G Modem stack Core 1 Core 2 Core 3 Core 4

Figure 2: A conceptual view of a 4G/LTE smart phone built on an AMP, multicore configuration. The number of asymmetric OSs is further multiplied through the use of virtual machines (VMs).

A smart phone’s multicore platform shows how the practical approach exploits the L4 microkernel, load balancer and virtualization subsystem.

(7)

Highlights Future mobile devices will incorporate multiple radio access technology standards including Universal Mobile Telecommunications System (UMTS) and the 3rd

Generation Partnership Project Long Term Evolution (3GPP LTE) to provide broadband mobile-data access and the best possible QoS in the user’s current environment. Other 4G wireless technologies such as WiMAX are implemented using a similar overall solution. An accepted fact is that mobile devices (smart phones) will experience exponential growth in the usage of multimedia applica-tions such as video streaming, videoconferencing, complex graphics and more— increasing the demand for power, which cannot be addressed much further through accelerating the processor clock. To offer the services demanded by the user, as well as to facilitate application portability, the coexistence of multiple software environments is a must. The eMuCo LTE handset addresses these requirements through the isolation of the core device functionalities: the wire-less modem subsystem and the L4 scheduler, which are timing-critical and require true real-time control; and the user-domain applications, which con-stitute the “soft” real-time part of the device’s software application.

Software architecture of the smart phone

Software architecture in the user (application) mode occurs at two levels. First, the protocol stacks are modeled in Specification and Description Language (SDL), which eventually generates C source code, benefiting from full validation at the model level through the use of model execution. The reason for C source code is the timing-critical, hard real-time nature of the protocol stack; code generated in C is relatively easy to control for size and performance after the executable appli-cation is generated. For softer real-time appliappli-cations such as the load balancer, the

Coexistence of multiple software environments in a mobile device is critical to meet user demand.

Protocol stacks in the user mode are modeled in SDL to generate C source code.

(8)

Highlights source code is generated from the application layer, modeled in Unified Modeling Language (UML), in this case IBM Rational® Rhapsody® software. One of the

flexibilities offered by Rational Rhapsody software is the ability to generate both C and C++ source code for varying performance and criticality requirements. By mapping the application threads in the model to the OS threads, it is possible to perform trial and error of thread distribution from the application to the OS and subsequently onto the individual cores.

Figure 2 illustrates the user mode applications and their relationship to the rest of the multicore system. Note that only the applications running on the L4 microkernel are shown. Embedded systems, such as our smart phone, use the AMP configuration described in figure 1-b, where the Linux OS is shown as “OS2.” The other counterpart, “OS1,” or the L4 microkernel, is a minimal computer OS, which only provides a basic set of kernel functionality and allows building user-level services by providing mechanisms such as address space management, thread handling and inter-process communication (IPC). Virtualization allows running a broad set of existing applications in new environ-ments by using virtual machines to run whole OSs along with their applications. A second layer of virtualization is used in the modeled stack layer. Here the under-lying protocol stacks drive the communications subsystem in the LTE device. Functionality of the stack can be validated on the host machine, guaranteeing successful integration on the actual LTE handset, when it’s available.

Real-time applications are modeled in UML to generate both C and C++ source code.

Virtualization allows running a broad set of existing applications in new environments.

(9)

Highlights The load balancer applicationThe load balancer provides the necessary services to support multicore

devel-opment such as the allocation of task/thread on the multiple cores, priority management and thread monitoring. The scheduling policies depend not only on the performance metrics obtained from the hardware platform or other applica-tions but also on the extent of interdependency between processes, for example, how much IPC happens within each core versus across different cores. The main requirements defined for load balancer are as follows:

The load balancer should ensure that enough computation power is provided

for each running thread.

The load balancer should ensure an optimal distribution of threads on the

available cores to improve power consumption where possible.

The load balancer should identify operational modes and change to these

modes as quickly as possible.

Through a model-based trial-and-error approach, an optimum subset of run-time use cases is created. These use cases collectively comprise the input to the load balancer. As shown in figure 2, the load balancer acts as a layer between multicore applications and the multicore run-time executive, essentially creating a control plane for the latter’s execution.

The load balancer provides the necessary services to support multicore development.

(10)

Highlights Basic architecture and behaviorThe load balancer provides support for real-time and non-real-time performance

and timing requirements. The architecture of the load balancer for a smart phone addresses the problem of CPU sharing between applications with wide-ranging requirements, while keeping the total power consumption of the system low. Figure 3 shows the system deployment diagram placing the load balancer along-side other applications, namely a video playback application on Linux. Threads from the protocol stack are managed by a run-time environment generated from the modeling tool, which in turn executes on the L4 microkernel; the resulting dual-kernel environment offers the flexibility of either executing the proto-col stack as a single, monolithic task on L4 or broken down into multiple L4 threads. Note that because the microkernel environment manages load distribu-tion, this setup offers flexibility to map one, two or more protocol stack threads onto separate cores, which is beneficial during high modem traffic when the stack becomes a performance bottleneck. Spreading the stack execution across multiple cores greatly relieves the bottleneck.

The modeling tool generates a run-time environment to manage the threads of the protocol stack as either a single, monolithic task or broken down into multiple threads.

(11)

Highlights To address the major requirement of conserving battery power, two major modes of operation were specified: low-bandwidth mode and high-bandwidth mode.

The low-bandwidth model implies lower video quality because both the video and protocol stack are located on the same core. The high-bandwidth (higher video quality) model, by definition, is distributed over two to three cores, pro-viding more data throughput but consuming more power.

A mode switch (low bandwidth to high bandwidth or vice versa) is forwarded to the load balancer by a signal originating in the protocol stack. As a response, the load balancer initiates a mode switch. Figure 4 shows the various components of load balancer: the load balancer engine (LB engine) and the load balancer proxy (LB proxy). The LB proxy runs on each CPU, receives information about the status of the threads on the corresponding CPU and sends the information to the LB engine. Further, the LB engine is composed of an engine component where the load-balancing logic is implemented. The thread table contains information about all threads in the system. It is initialized at system startup. The thread monitor receives information about the release time and deadline of threads from the LB proxy and updates the thread table.

Resource requirements for every application can be described in the form of a contract of the application with the rest of the system. The resource requirements can be expressed in the form of minimum budget (CPU time requirement), maxi-mum period and number of cores that will provide the required budget in the given period. For each such contract, offline or online negotiations shall be made across the whole system to determine whether the minimum required computa-tion power can be guaranteed to all accepted contracts.

The protocol stack can signal the load balancer to switch from low-bandwidth to high-low-bandwidth mode, depending on the application.

An application’s resource require-ments can be described as a con-tract with the rest of the system.

(12)

Highlights Using IBM Rational Rhapsody software and MDD for the load balancerFour main components of the load balancer, identified above, are shown in

figure 4 as a model in Rational Rhapsody software. To manipulate threads that run on all cores, the LB engine needs to internally maintain a thread table con-taining information about the threads running in the system. The thread table should contain at least these fields:

Thread ID (integer)

CPU ID (integer between 0 and

• n, where n is the total number of cores minus

one; in this case, n=3)

Priority (integer between 0 and 255)

Optional: thread constraints

Release time (the moment when thread execution is requested)

Deadline time (the maximum time until the task must be completed)

Two possible design decisions may occur here, depending on the way we choose to maintain the thread table. One possibility is to simplify the approach and consider that all threads that will eventually run in the system are statically created at system initialization. The thread table can also be statically allocated and initial-ized and would never have to change in size during the lifetime of the system. The second choice is to allow dynamic thread creation and let the load balancer cope with it by constantly updating the thread table as soon as a new thread is created or an existing thread is killed.

The load balancer engine maintains a thread table with either static or dynamic allocation.

(13)

Highlights

LTE will enable much higher data rates to support multimedia applications.

Figure 4: Main components of the load balancer in Rational Rhapsody software

Introduction to protocol stacks and 4G/LTE technologies

The LTE is the successor of the UMTS wireless technology. It will enable much higher data rates along with much lower packet latencies in an Internet Protocol (IP)–based system. LTE will provide maximum data rates of 100Mbps in the downlink and 50Mbps in the uplink direction. Another aspect is the dramatic increase of multimedia applications in the broadest sense including video stream-ing, videoconferencing and online gaming. Initiated in 2004, the LTE project focuses on enhancing the Universal Terrestrial Radio Access (UTRA) and opti-mizing the 3GPP radio access architecture.

(14)

Highlights Using MDD for developing protocol stacks for smart phonesBesides other domains, a majority of wireless base stations and personal

handset devices currently use the SDL and Testing and Test Control Notation (TTCN) version standards for protocol development. Consequently, SDL-generated and TTCN-tested systems now drive a majority of all 3G wireless technology in the world.

As the merits of domain-specific modeling are now becoming apparent, one has to commend the foresight used in development of the SDL language more than two decades ago. SDL, coupled with TTCN, has provided a solid workflow for model-driven and agile development approaches for second-generation (2G) and 3G wireless systems and is slated to repeat the same for the next-generation and 4G wireless protocol development. It was a logical choice to use SDL in the development of a light version of the LTE protocol stack layer 2 (L2) and layer 3 (L3) for the smart phone.

Furthermore, IBM Rational SDL Suite software offers an integrated workflow with Rational Rhapsody software, allowing the smart phone applications developed in Rational Rhapsody software to communicate with the protocol stack in Rational SDL Suite software. This feature allows automation such as single-executable automation generated from both modeling tools and synchronized model execu-tion with cross-simulaexecu-tion support.

SDL and TTCN can work together to provide integrated workflow to support 4G wireless protocol development.

(15)

Highlights System-level specification in SDLAs a first step of our modeling, shown in figure 1, the Message Sequence Charts

(MSCs) are created. Several scenarios are plotted in conformance with the LTE standard. As an example, figure 5 demonstrates the MSC for a target LTE proto-col scenario. The MSC represents a looped LTE protoproto-col data path from the IP layer through L2 uplink to L2 downlink. Therefore, the MSC is used for system analysis. The system can also be verified by comparing the designed MSC with the MSC generated by the IBM Rational SDL Suite simulator user interface (UI). The application data rate is realized using the timer T1 and the IP packet pay-load length. In the uplink direction (see the upper part of figure 5) the IP packet should be processed in the Packet Data Convergence Protocol (PDCP) sublayer and sent to the Radio Link Control (RLC) sublayer to be concatenated with other PDCP protocol data units (PDUs) depending on the transmission opportunity notification from Medium Access Control (MAC) sublayer. The data rate in the mobile terminal is determined by the timer T2 together with the transport block size.

In the downlink, when the MAC sublayer receives a transport block, it should process its header and forward the RLC PDUs to the upper sublayers (see figure 5). The PDCP PDUs should be extracted from the received RLC PDUs and sent to the PDCP sublayer to feed the IP layer with the IP packets.

MSCs are created as the first step in modeling scenarios and also are used for system analysis.

(16)

Highlights

The MSC models the LTE data path in uplink and downlink.

Figure 5: The MSC for LTE data path in uplink and downlink, showing packet flow from L3 through L2 and the transport block reception and processing in both layers

(17)

Highlights Modeling the LTE protocol stackThe LTE protocol stack modeled in the SDL system is composed of two blocks,

shown as LTE_PS and Radio Interface, as shown in figure 6(a). The radio inter-face forwarding functionality is modeled using two processes, for receiving and forwarding the transport block, respectively, to the downlink side. As illustrated in figure 6(b), inside the LTE_PS block there are four subblocks and more than 20 processes used to model the L2 and L3 layers of the LTE protocol stack. All presented models reflect the LTE standard, but many are out of scope for this overview paper and are left out.

Radio interface forwarding func-tionality is modeled to include both receiving and forwarding the trans-port block.

Figure 6: LTE protocol hierarchy; levels include (a) SDL based system, (b) LTE protocol block, (c) MAC subblock and (d) MAC uplink process

(18)

Highlights Integration with OSsSmart phone implementation in SDL is integrated with the OS. The modeled

system is divided into separate execution threads using the deployment editor in Rational SDL Suite software. As is the case with true MDD tools, the multi-threaded model is generated as C code by the Rational SDL Suite C-extreme code generator. This source code is usually compiled and linked with the hand-written C code as well as C libraries provided by the tool for the purpose of OS platform adaptation.

Model execution and simulation of the LTE device

Both IBM Rational modeling tools used are real-time software development solutions that provide specification and development capabilities for complex, event-driven communications systems and protocol software. IBM Rational SDL Suite, Version 6 software is used to analyze the LTE model and automatically generate C code. The generated code is compiled and linked with the handwrit-ten C implementation for header processing (see figure 1). The overall system is simulated to check the functionality and compare it with design target MSCs presented earlier.

IBM Rational modeling tools check system functionality and compare it with design target MSCs.

(19)

Highlights As illustrated in figure 7, the IP packets propagate through PDCP, RLC and MAC sublayers to the radio interface. The SDL processes that are not relevant

to the uplink are removed from the MSC for the sake of clarity.

The MSC created automatically during model-level simulation can be compared to the specification MSC.

Figure 7: An MSC created through model-level simulation shows data transfer through L2 and L3. This MSC can be compared to the MSC defined as part of the specification for functional/unit testing and regression testing.

The received transport block is processed in MAC, RLC, PDCP and IP layers to extract the IP packet payload (see figure 7). The MSC is comparable to the target design MSC shown in figure 4. The main functional differences between the two versions are negligible, despite simulation output rendering many more irrelevant objects. The SDL processes that are not relevant to the downlink are removed from the MSC for the sake of clarity. As a functionality point of view, the LTE protocol considered in this work is successfully implemented using SDL.

(20)

For future work, we propose that the LTE implementation presented in this paper can be integrated with any general purpose OS supporting Portable Operating System Interface (POSIX) standard and running on a multicore embedded system.

For more information

To learn more about taking a practical approach to multicore system develop-ment, contact your IBM representative or visit:

ibm.com/software/rational/products/rhapsody/developer/ Acknowledgments

The authors acknowledge the excellent cooperation with all partners within the ICT–eMuCo consortium, supported by the European Commission.

rence in this information with a trademark symbol (® or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates. The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this docu-mentation, it is provided “as is” without warranty of any kind, express or implied. In addition, this infor-mation is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any dam-ages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warran-ties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer’s sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regula-tory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws.

References

“Modeling LTE Protocol for Mobile Terminals using a Formal Description Technique,” Anas Showk, David Szczesny, Shadi Traboulsi, Irv Badr, Elizabeth Gonzalez and Attila Bilgic. SDL 2009: Design for Motes and Mobiles, 14th International SDL Forum, Bochum, Germany, September 22–24, 2009, Proceedings. Editorial Springer. ISBN 978-3-642-04553-0.

“SDL/Virtual Prototype Co-design for Rapid Architectural Exploration of a Mobile Phone Platform,” Shadi Traboulsi, Felix Bruns, Anas Showk, David Szczesny, Sebastian Hessel, Elizabeth Gonzalez and Attila Bilgic. SDL 2009: Design for Motes and Mobiles, 14th International SDL Forum, Bochum, Germany, Sep-tember 22–24, 2009, Proceedings. Editorial Springer. ISBN 978-3-642-04553.

References

Related documents

The purpose of this study was to investigate the risk factors associated with late preterm births in Sichuan Province, China, and to perform a systematic review of the literature

• Moving old systems to the cloud and developing new tools through the cloud gives your business the agility to scale up to meet rising demands without having to constantly invest

Configuring HAproxy as a SwiftStack Load Balancer To illustrate how a SwiftStack cluster can be configured with an external load balancer, such as HAProxy, let’s walk through

 Does the organization have a contact responsible for privacy and access/amendment to my personal information. What to Look for in Website Privacy

OTRS Help Desk - Fields of Application § Internal and external IT Service (ITSM) § Customer service and product support § Civil-citizen service § Call Center § Complaint Management

In particular, an antitrust enforcement program crafted to promote innovation would seek to protect product market competition in “winner-take-most” or “winner-take-all”

A second larger analysis of the 11 species determining variable sites was performed using the sequences from all 27 Chadian specimens, in addition to sequences from Ivorian and

Baythroid is labeled to control flea beetle, potato leafhopper, European corn borer and Colorado potato beetle and suppress aphid species.. Permethrin (Ambush, Pounce