• No results found

Application Source Codes Profiling for ASIP Memory Subsystem Design

N/A
N/A
Protected

Academic year: 2021

Share "Application Source Codes Profiling for ASIP Memory Subsystem Design"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Procedia Engineering 29 (2012) 3160 – 3164 1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.458 Procedia Engineering 00 (2011) 000–000

Procedia

Engineering

www.elsevier.com/locate/procedia

2012 International Workshop on Information and Electronics Engineering (IWIEE)

Application Source Codes Profiling for ASIP Memory

Subsystem Design

Xiaoyang Li

a

, Wenbiao Zhou

a*

, Dake Liu

a

aInstitute of ASIP, School of Information and Electronics, Beijing Institute of Technology, Beijing, China. 100081

Abstract

Application Specific Instruction-set Processors (ASIPs) are a realistic solution for domain-specific applications. To reach optimal system-level performance, memory subsystem design is considered in the pre-architecture design stage, which narrows down the huge design space to applications of a specific domain. Source code profiling approach aims to understand the characteristics of applications to guide ASIP design. The memory profiler proposed in the paper uses dynamic profiling technique to generate memory traces, and the live intervals of memory objects are computed by load-store information. Then memory requirement diagram is plotted according to instruction counts. The minimum memory requirement of the application is acquired from the diagram and guides the design of memory subsystem. The profiler is tested using a computing kernel, and the memory subsystem design suggestions are given according to the profiling results.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University of Science and Technology

Keywords: ASIP ; profiling ; memory;

1. Introduction

Comparing to general purpose processors (GPPs) and application specific integrated circuits (ASICs), application specific instruction-set processors (ASIPs) are a realistic solution for flexibility and high-performance [1]. However, ASIP design space is quite huge and even experienced designers could not handle it properly [2]. Most ASIP design flow relies on the accurate understanding of application source codes.

* Corresponding author. Tel.: +86-10-6891-8279; fax:+86-10-6891-8279.

E-mail address: [email protected].

Open access under CC BY-NC-ND license.

(2)

Fig. 1. Proposed ASIP design flow

Our ASIP design flow is illustrated in Fig. 1. Application source codes are firstly profiled and analyzed. The results of profiling are the design documents of instruction-set and memory architecture suggestions. Architecture engineers design the architecture details and write architecture description language (ADL). The ADL is then synthesized using architecture synthesis tools, such as NoGAP[3]. Finally, the architecture is tested and benchmarked. The design flow may have several iterations to meet timing, area, power etc. requirements. The “memory wall” is the growing disparity of speed between processor and memory. It was expected that memory latency would become an overwhelming bottleneck in the system performance. It becomes the trend to estimate the memory cost and give a rough design of memory hierarchy in the early stage of architecture design. A common technique for understanding source codes is profiling. By instrumenting extra codes in the sources, profilers record execution frequencies of functions, basic blocks and lines.

Some research fills the gap between source codes and memory subsystem, such as Micro-profiler [4]. Micro-profiler instruments intermediate representation of application source codes, and collects execution

(3)

counters of basic blocks. It memory profiler is able to track the read and write counts of variables and presents in GUI.

This work presents a profiling approach to analyse application source codes to analyse application memory requirements. Firstly, memory trace generation method is designed and implemented in Low Level Virtual Machine (LLVM). Secondly, the memory requirement along the program execution time line is plotted. Architecture designers reply on this information and consider other factors, such as area cost and design complexity, to design the actual memory hierarchy.

This paper is based on LLVM compilation framework, and relies on its interpreter execution capability to record memory traces. In addition, this paper aims to provide application memory access patterns for memory hierarchy design and try to estimate the minimum memory requirement of the application.

2. Profiler design

1.1. Memory trace generation

Memory traces are the access operations to memories, such as load a value from a memory and store a value to memory. Memory objects are data values than could be loaded or stored in memory. In this work, not considering instruction memory, memory objects are equivalent to variable in the program. The memory objects in the program codes are divided into three types:

• Global and static variables

• Locally declared variables, which are placed on the run-time stack

• Dynamically allocated variables, which are placed in heap and are established and destroyed explicitly using malloc and free in C language.

To get the memory range of the memory objects, the size of the memory objects should be computed. Here we define sizeof() as the function to return size of the memory objects. Thus, suppose the memory object is of type T , we have,

• sizeof(T)=sizeof(basic type), if T is a scalar, such as int or double.

• sizeof(T)=n×sizeof(Telement), if T is an array, and n is number of elements, and the T element is the type

of the element.

• sizeof(T)=sum(sizeof(Ti)), if T is struct, and Ti is sub-type of T.

• sizeof(T)=max(T1,T2,…,Tn), if T is union, and Ti is the element in the union.

By this basic definition, the sizes of all types in the source codes could be computed recursively. The end address of the memory object is the sum of start address and the type size. Every memory access is represented by load and store intermediate representation of the compiler. Extra trace codes are hooked in the load and store locations. When the compiler intermediate representation instructions are executed, the address, access size and instruction count are recorded to a text file. Besides, in the beginning of the program execution, global variable allocation is stored in the memory trace file, including variable name, start address, end address and instruction count. The instruction counts of global variables are always zero. In the beginning of every function execution, the stack variable allocation is stored in the memory trace file, with the same format of global variable allocation. After the bit codes finish execution, the memory trace file contains all the allocation information and load-store information, which are sufficient for memory trace analysis.

A dynamic symbol look-up table is generated by analysing the trace file. If the address range of a memory object is overlapped with a previous one along the execution path. The previous memory object is marked as dead.

(4)

The live time of a memory object is defined as the first store to the memory object and the last load to the memory object. Pointer alias problem makes it not clear and reliable to save variable names in the trace file. Instead, only access addresses are saved. The addresses are then compared to the start address and end address of the symbol table to determine the variable accessed.

The variable table in constructed using only allocation information. If the address range of the new memory object is overlapped with the old one, the old one is marked invalid in the symbol table at the instruction count of the allocation of the new memory object.

1.2. Minimum memory requirement

A memory object is not used before the first load instruction and the last store instruction. Global variables are always initialized, that is to say, a store instruction always follows a global variable allocation. Thus the access patterns of memory objects are divided into the following categories:

• If all memory accesses to the memory object are store without load, the live interval is marked as zero. • If the first access to the memory object is load, this is marked as illegal, and the source code needs to

be changed.

• If accesses to the memory object include both store and load, and the first access is store, the live interval is the interval between the first store and last load.

By this rule, we can compute the live intervals of all the memory objects. The live interval is represented by the instruction counts of intermediate representation. Summing all the sizes of the live memory objects at a particular instruction count, the memory requirement at that instruction count can be computed. Iterating over all the instruction counts, the memory requirement diagram is able to be plotted. The algorithm is described using Pseudo code in Fig 2(a).

3. Implementation

The memory profiler is implemented in Low Level Virtual Machine (LLVM). LLVM has an interpreter execution engine. Extra instructions are hooked at load interpreter and store interpreter, to dump the load and store information to a text file in the sequence of program execution. The memory traces are processed using a Python script to find out memory requirement. The memory requirement diagram is finally plotted in Python.

4. Simulation and results

To evaluate the results, a computing kernel 1024-pt fast Fourier transform is tested by the profiling tool. The memory requirement diagram is plotted in Fig. 2(b). The horizontal axis represents instruction counts, and the vertical axis represents memory requirement at the instruction count in Byte. By analyzing the diagrams, the peak point (6223 Bytes) is the diagram is the minimum memory requirement. The system should have at least 6223 Bytes memory.

5. Conclusions and future work

This paper presents a memory profiling tool for ASIP memory design. The profiler is based on LLVM framework to execute the source codes by interpreter. The allocation and access information of memory objects are recorded to a text file. The minimum live internal of memory objects are calculated and a memory usage diagram is plotted. In future work, more complex case studies will be performed to verify the usefulness and correctness of the profiler. This work only estimates requirements of main memory. A

(5)

memory partition algorithm should be proposed for memory hierarchy design, with both scratch-pad memory and main memory.

Fig. 2. (a)Algorithm for memory requirement diagram ; (b)Memory requirement diagram of FFT

References

[1] D. Liu, Embedded DSP Processor Design, : Application Specific Instruction Set Processors (Systems on Silicon),2005. [2] K. Keutzer, S. Malik, A. Newton, From asic to asip: the next design discontinuity, in: Computer Design: VLSI in Computers and Processors, 2002. Proceedings. 2002 IEEE International Conference on, 2002, pp. 84 – 90. doi:10.1109/ICCD.2002.1106752.

[3] P. Karlstro andm, W. Zhou, D. Liu, Implementation of a floating point adder and subtracter in nogap, a comparative case study, in: Embedded and Ubiquitous Computing (EUC), 2010 IEEE/IFIP 8th International Conference on, 2010, pp. 68 –72. doi:10.1109/EUC.2010.20.

[4] K. Karuri, M. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, H. Meyr, Fine-grained application source code profiling for asip design, in: Design Automation Conference, 2005. Proceedings. 42nd, 2005, pp. 329 – 334. doi:10.1109/DAC.2005.193827.

References

Related documents

The purpose of this report has been to provide a synthesis or appraisal of results from large-scale national and international assessments in Australia over the past 20 years

• If the two-stage thermostat changes the call from low heat to high heat, the integrated control module will immediately switch the induced draft blower, gas valve, and

Although environmental and climatological variables contribute to explaining body size variation in some bruchines, the effects of relevant seed traits, such as size (Fox et al.

The purpose of this study is to evaluate the relationship of perceived emotional/ informational social support to self-care maintenance, self-care management, and self-care

Collectively, these results revealed that the repeated sequences present at the 3′ -UTR of SINV genome may regulate the translation of gmRNA selectively in mosquito, but not

ระดับสถานการณ์ในการเรียน (Learning Situation Level) หมายถึง สถานการณ์ที่เกี่ยวกับการเรียนการ สอนในห้องเรียน แบ่งเป็น 3 องค์ประกอบ ได้แก่

PRA Health Sciences has established consistent processes and clear accountabilities for literature surveillance to identify adverse drug reactions, assess the expeditability of

Energy sorghum is a forage sorghum bred for high biomass production. Elu- cidation of the flowering time gene regulatory network and identification of