Performance Profiling of the system with Two-Level Hardware Interrupts

(1)

Performance Profiling of the system with

Two-Level Hardware Interrupts

CHANDRAKANTH PABOLU*

School of Computing Sciences VIT University, Vellore, India

M.NARAYANAMOORTHY*

M.RAJASEKHARA BABU*

Abstract:

In this paper, we have explained the importance of interrupts and then proposed a system built by the two-level hardware interrupts. The components which affect by this architecture are found out and optimizations are proposed. Then we proposed a system which is the combination of the real-time kernel and the time sharing OS (Operating System) kernel with two-level hardware interrupts. By considering this we discuss the important issues for the implementation. Then we proposed a special mechanism to evaluate the performance of the system by introducing some special hooks in the system. We conduct experiments on set of driver versions and found the performance variations. The results show that the performance of the system can be increased.

Keywords: two level hardware interrupts, hybrid operating system, Linux.

1. INTRODUCTION

When an interrupt occurs, the interrupted process state is saved by the hardware, software or by the combination of both. This consists of generally program counter, registers and memory. The saved state should normally follow the below conditions.

a) All instructions preceding the instruction specified by PC should be executed and modified by the process state correctly.

b) All instructions following the PC should not be executed and not modified by the process state.

c) If an exception condition raises an interrupt, the saved PC should points to interrupted instruction.

The input/output subsystem increasingly affects overall system performance as the semiconductor technology moves processor performance into the multi-gigahertz range. To efficiently transfer data for the interconnection networks and multimedia content, the efficient transfer of data into and out of computer systems.

(2)

Several factors are involved in measuring the interrupt cost. As larger kernel operation is involved in measuring the interrupt cost it cannot be directly observed at the application level like other performance characteristics. These interrupts are not easily controlled by application-level software as they respond to asynchronous events. By measuring the cost of interrupt handling of interrupt handling various micro architectural aspects such as instruction decode and graduation counts, branch prediction behavior and cache misses can be predicted.

To get the low development and maintenance cost we have to choose a particular OS such as Linux [5]. From this a hybrid system is built which is a combination of the real time and time sharing subsystem. Due to this real time and non real time tasks can be performed. But the problem with this OS will be it is used generally for the general purpose applications.

In the past several methods [3, 1, 4, and 6] were proposed such as to separate the real time with the non-real time interrupts. This can be achieved by separating in the interrupt handling code based on the one level interrupts. But we found that it causes problems. We chose RTAI, an open source hybrid system based on Linux, as the representative for commodity-OS-based hybrid systems.

In RTAI, the Linux OS kernel is treated as the idle task, and it only executes when there are no real-time tasks to run and the real-real-time kernel is inactive. To achieve this functionality interrupt-handling code is modified, so the Linux task can never block real-time interrupts [9]. Several problems are caused by using this method. The following details explain this:

a) Linux task uses the interrupt disabling in interrupt handlers, critical sections. It can be achieved by setting a flag in the interrupt-handling code which specifies to disable the non real time interrupts. These will be only in the interrupts handling code. So, it has to process completely. It degrades the performance enormously.

b) For identification and emulation, several functions have to be written. This increases the size of the interrupt-handling code. In the same time interrupts requests were also increased.

c) The hardware abstraction layer (HAL) has to be rewritten which involves a complex work to port on various processors.

We propose to implement hybrid operating systems based on two-level hardware interrupts to solve these problems. Better performance can be achieved when real-time [9] and non-real-time hardware interrupts are separated by hardware. Our focus is to minimize the real-time interrupt latency and enhance the performance of the time-sharing. We have chosen the ARM architecture [7] which provides two-level hardware interrupts with different interrupt request entries. By this architecture support, it improves the performance for both time-sharing and real time subsystems. The main contributions are:

a) We will discuss the implementation of hybrid system based on two-level hardware interrupts. In this we will combine the real-time kernel with time sharing OS kernel. Then the interrupt latency is analyzed.

b) We implement a hybrid system called RTLinux-THIN (Real-Time LINUX with Two-level Hardware INterrupts) on the ARM architecture [7] by combining ARM Linux kernel 2.6.9 and μC /OS-II. These two are the widely-used kernels with source code available in embedded and real-time fields (ARM Linux for time-sharing systems and μC/OS-II for real-time systems).

c) We have chosen Intel PXA270 processor [8]. Then we designed two versions of drivers where one is faulty and other the faultless driver. Then we capture and analyze the number of interrupts generated from the two versions.

2. BACKGROUND

In this section we present the importance of interrupts and interrupt handling in the hybrid systems. Later we analyze the interrupt handling and the worst case real-time interrupt latencies in RTAI.

(3)

IDR and finally reaches to ISR. Here the execution or the rescheduling of the interrupts takes place. So, the

mechanism has to be changed for separating the real-time [9] with non-real-time interrupts since they will be processed differently. Then we have to solve the interrupt disabling problem when dealing with non-real-time interrupts.

In the hybrid system time sharing system has the lowest priority. So it cannot preempt the real time interrupts. In most of the processors if we mask the PSW (Program Status Word) register the interrupts are disabled. This method is used in interrupt handlers, critical sections, and so on. So, we can’t trust this method. Currently, a general approach to solve this problem is to use software to emulate such interrupt control.

2.1 Interrupt Handling in RTAI

RTAI is a Linux-based hybrid system, in which the Linux kernel is treated as the idle task, and it only executes when there are no real-time tasks to run and the real-time kernel is inactive. In RTAI a software emulation method called virtual interrupt controller is used in the interrupt distribution code to solve the interrupt disabling problem.

From Figure2, when the Linux task disables interrupts, it is set in the virtual interrupt controller of the software interrupt controller instead in the interrupt disabling/enabling bit in the PSW.

When an interrupt request is hit in the interrupt distribution code, we first decide if it is a real-time interrupt or non-real time interrupt. If the interrupt is a real time interrupt then it directly calls the corresponding ISR. If the interrupt is non-real time interrupt then we have to check the interrupt enabling/disabling bit in the virtual interrupt controller. If the bit is not set then it will call the corresponding ISR. If the bit is set then it is stored but not serviced but the acknowledgement is sent. If the bit is cleared then the stored interrupts are serviced.

Figure2: Interrupt processing flow of the interrupt distribution routine

This method causes many problems like even though the bit is set for disabling the interrupts the processor still responds to the non real time interrupts as it is a software emulation based approach. So it causes unnecessary responses to the CPU which leads to overhead.

(4)

summation of the waiting and the interrupt processing time. We can further divide into two parts: the distribution and the interrupt service parts. Based on this division, the real-time interrupt latency for interrupt Ik.

From the above figure3 we can say that the total time is the summation of the distribution and the service time and denoted as:

P K TD K TS K

Figure3: Interrupt Division in the interrupt distribution routine

TD is the time it takes to reach the corresponding service routine from the point of source. Ts is the

time it takes to complete the service request.

The worst case latency can be stated as the summation of the distribution time and the waiting time and denoted as:

WorstCase Execution Time(Ik)=

TD(K)+worstcase waiting time.

The worst case waiting time will depends on the following factors.

a) If we consider the priority of the interrupts then, to process the interrupt Ik the preceeding interrupts I1,I2,…,

Ik-1 interrupt to be completed. This can be given as.

T i

K

TD i TS i

K

b) If we consider the interrupt disable, then the following conditions may be considered. i) The system may enter into the distribution code so it has to wait for max (TD).

ii) The system may enter into the service routine, so it has to wait for max (TS).

iii) The system may enter into critical section, so it has to wait for max (TC).

iv) The system may be executing the trap instruction, so it has to wait for max (TT).

If we observe all the cases the system may be in only one state so finally it becomes max (max (TD),

max (TS), max (TC), max (TT)).

Worst Case Execution Time

TD K TD i

K

TS i K

max max , max TS , max TC , max TT

2.2 Implementing the system with two- level interrupts

(5)

Figure4: Interrupt request handling procedure after modifying the system with 2 level hardware interrupts

As the hardware is separating we don’t need to separate in the software. The real time interrupts will go to the real time kernel and the non real time interrupts will be passed to the time sharing subsystem. When the non real time interrupts are disabled, it will be much easier to handle the interrupt as we can simply ignore the interrupts which are entering into the time sharing subsystem and process the interrupts which enter into the real time subsystem. When the non real time interrupts are disabled it will not affect the real time interrupts. The devices will get the response quickly and the system becomes faster. So, the performance of the system will be increased. As the hardware separates the distribution the interrupt distribution code becomes smaller.

In the worst case scenario by preempting the Linux task the waiting time caused by the trap instruction (TT) can be removed. Therefore the above equation becomes,

Worst Case Execution Time T′D i

K

T′S i K

max max ′ , max T′S , max T′C

3. SYSTEM IMPLEMENTATION

By combining the ARM Linux kernel 2.6.9 [2] with the μC/OS-II and by introducing the two level hardware interrupts we formed the RT Linux -THIN (Real Time LINUX with Two-level Hardware INterrupts). Here we describe the details of the Linux kernel.

There are two types of interrupts present in it named as IRQ (Interrupt Request) and FIQ (Fast Interrupt Request). IRQ is related to the non real time interrupts where as real time interrupts comes under FIQ. FIQ has high priority than the IRQ. The FIQ can preempt the IRQ.

Figure5 shows the high level system architecture. Here the non real time interrupt are handled by the Linux kernel and the real time interrupts are handled by μC/OS-II scheduler.

The Linux kernel is treated as an idle and will execute only when no real time interrupts are present. The priority is maintained as follows: real-time interrupts > real-time tasks> non-real time interrupts >

(6)

4. EXPERIMENT SETUP

4.1 Environment

We used the Intel PXA270 processor [8] which complies with ARM architecture [7] runs at 520MHz with 64MB SDRAM and 32KB I-Cache. We used VGA monitor with a 32bit color depth and 1024x 768 resolution to watch the output. The operating system we used the RT Linux THIN which is the combination of Linux kernel and μC/OS-II.

4.2 Additional Equipment

Two devices namely EDID board and DVMU board and a CRT monitor are required for performing the test. Extended Display Identification Data (EDID) is a mechanism supporting monitor plug and play, using data fields in an EEPROM located in the monitor. The data stored in this EEPROM identifies the characteristics, features and video timing modes supported by the display product. The EDID emulator board will help in plugging and unplugging CRT, LFP and eDP panels. The DVMU board will simulate the HDMI, DP or DVI displays. The CPU will make communication with these boards and thinks they are the real physical displays.

4.3 Driver Code Changes

In the figure6 shown below explains the changes required for the driver. We have introduced a flag [11] in the driver to specify to start or stop the recording of the interrupts. When the first interrupt is hit we will allocate 2MB of memory to store the interrupts.

When an interrupt is hit then it checks the flag whether it is set or not. If it is not set then it will do the normal processing. If it is set then it checks whether the cache is full. If it is full it clears the previous entries else the interrupt information is stored. In this way all the interrupt information is stored until the interrupt status flag is set to false. When the application specifies to stop recording the flag is set to false. Then the driver will pass 256 byte chunk of data each time until all the data is transferred to the application.

So, in the driver code flags are to be introduced and memory should to be created for storing the data. Deadlock issues are to be handled.

4.4 Performance Profiling

On developing the device drivers many modifications are to be done day by day. A modification done in a module may affect the other due to the dependencies present between the modules. This could lead to the interrupt storm.

(7)

Figure6: Exchange of data between driver and application

The interrupt automation tool is tested with the two versions of the driver, one is the faulty driver and other is the non faulty driver. We compare the total number of the interrupts between them and then presented the results.

4.5 Experiment Setup

By comparing the no. of interrupts between them, the faulty driver produces more number of interrupts than expected. So, by considering the particular interrupt we can go to the corresponding module where the interrupt is generated and correct the code. In this way the performance is increased.

Connect EDID board, DVMU board and CRT to the Intel processor. Install the required drivers for identification pairing the boards. The faulty driver is unable to set the registers in the processor when the displays are plugged or unplugged. So, we have developed a program which will plug and unplug HDMI and DP displays for 100 times and named it as HotPlugUnPlug.exe.

4.6 Screenshots and results

Install the faultless driver in the system. Input the executable i.e., HotPlugUnPlug.exe to the interrupt automation tool. Now run the tool until the program got completed. This tool will record all the types of interrupts which are generated when plugging or unplugging the displays. The results are stored in the form of xml file and can be reused later. Let us treat it as golden data. From the figure7 we can observe the results for the golden data.

Now install the faulty driver which generates interrupts storm due to the improper setting of registers in the system when plugging and unplugging the displays. Again perform the previous steps by running the tool. This generates the test xml.

(8)

Figure7: Figure showing captured golden data

Figure8: Comparison of interrupts produced from defect and faultless driver

Figure9: Comparison of interrupts count produced from defect and faultless driver before code changes

0 100000 200000 300000 400000 500000

10

min 15

min 20

min 30

min

Faultless

driver

Interrupts

Defect driver

(9)

Figure10: Comparison of interrupts count produced from defect and faultless driver after code changes

Figure9 shows the variation of the total number of interrupts hit for the specific time intervals. By considering each interrupt type we explored the corresponding module where the registers corresponding to the interrupts updated. After modifying the code the total number of the interrupts is decreased. So, the system performance is increased. The comparison of the interrupts count produced after modifying the driver is shown in the figure10.

5. CONCLUSION

In this paper, we first explained the importance of the interrupts. Then we explained about the hybrid system. Later the key implementations and issues faced designing the system is discussed. Then we extended the RT Linux which is a commodity OS with the two -level interrupts for Intel PXA270 processor [8]. The problems faced in designing the system are discussed.

We then proposed a special mechanism to evaluate the performance of the system by introducing some hooks in the system. We conduct experiments on set of driver versions and found the performance variations. Considering the variations the corresponding code is modified. The results show that the performance of the system can be increased.

References

[1] A. K.Mok, X. A. Feng, and D. Chen. Resource partition for real-time systems. In RTAS ’01: Proceedings of the Seventh Real-Time Technology and Applications Symposium (RTAS ’01), page 75, Washington, DC, USA, 2001. IEEE Computer Society.

[2] D. P. Bovet and M. Cesati. Understanding the Linux Kernel, Second Edition. O’Reilly & Associates, Inc., 2002.

[3] K. W. Batcher and R. A. Walker. Interrupt triggered software prefetching for embedded cpu instruction cache. In RTAS ’06: Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’06), pages 91–102, Washington, DC, USA, 2006. IEEE Computer Society.

[4] C. Kirsch, M. Sanvido, and T. Henzinger. A programmable microkernel for real-time systems. In Proc. ACM/USENIX Conference on Virtual Execution Environments (VEE). ACM Press, 2005.

[5] Y. Zhang and R. West. Process-aware interrupt scheduling and accounting. Real-Time Systems Symposium, 2006. RTSS ’06. 27th IEEE International, pages 191–201, 2006.

[6] X. A. Feng and A. K. Mok. A model of hierarchical realtime virtual resources. In RTSS ’02: Proceedings of the 23rd

IEEE Real-Time Systems Symposium (RTSS’02), page 26, Washington, DC, USA, 2002. IEEE Computer Society.

[7] D. Seal. ARM Architecture Reference Manual, 2nd Edition.Addison-Wesley, Nov. 2000. [8] Intel, Inc. Intel PXA27x Processor Family Developer’s Manual, Jan 2006.

[9] G. Li, S. Yuen, and M. Adachi. Schedulability Analysis of Real-Time Systems with Nested Interrupts (extended version).

http://www.agusa.i.is.nagoyau.ac.jp/person/li.g/pa-per/Interrupt-Full.pdf, 2008.

[10] D. Heller. Rabbit: A performance counters library for intel/amd processors and linux. Technical report, Scalable Computing Laboratory, Ames Laboratory, U.S. D.O.E., Iowa State University, October 2000.

[11] B. Sprunt. Brink and abyss: Pentium 4 performance counter tools for linux. Technical report, Electrical Engineering Department, Bucknell University, 2002.

0 100000 200000 300000 400000 500000 10 min 15 min 20 min 30 min Faultless driver Interrupts

Defect driver