Abstract - Embedded systems are portable battery powered devices that have limited power resource. Hence, most of embedded systems need to meet energy constraint. Performance and energy consumption are the most important metrics for embedded system design. Estimation of performance, energy utilization and its validation are essential for embedded system design. Attempt has been made to precisely measure software energy consumption by three methods on ARM Cortex M4 processor. The results are validated with five benchmark programs. Tedious calculation of inter instruction cost has been minimized by taking it as certain percent of total energy.
Percentage error between actual and estimated energy is found to be less than 5%.
Index Terms— Current measurement, Embedded system, Energy estimation, Software energy.
I. INTRODUCTION
An Embedded system can be electronic system or a computer system designed to perform a particular order of task(s) or a specific task. It is a system built to execute its functions completely or partially independent of human intervention. It is specially designed to perform specific tasks in the most efficient way. Embedded systems are designed to perform specific tasks. Embedded systems are not always standalone devices. Embedded systems have very limited resources, particularly the memory. Generally, they do not have secondary storage devices. They cannot be programmed to perform anything other than the tasks for which they are designed. Embedded systems are constrained for power. As many embedded systems are powered by a battery, the power consumption has to be very low. There are many other optimization techniques available for design of embedded system which focus on the design for optimization of cost, power, and area. There is always tradeoff which exists between different design metrics which has to be managed carefully by the designer to improve the overall performance of the system. With increasing complexities of functionalities and end user expectation of longer battery life, power has become critical parameter for design consideration. The rate at which complexities are added is much higher compared to development rate in battery technology. The end user expects more time between successive recharges and this is possible
Published on February 25, 2019.
V. A. Kulkarni is Research Scholar at GIT, Belgavi, India (e-mail:
with power aware design at various levels. Accurate models for software power, energy consumed during software execution by the processor, are essential for design of power optimum software structure. As software energy contributes significantly to system energy, accurate energy estimation is necessary for system energy optimization.
Power consumption model of the processor software can be categorized as Low-Level models and High-Level models.
Low level models are also called as hardware models. Power and energy is calculated from detailed electrical descriptions, comprising circuit level, gate level, register transfer (RT) level or system level. High-Level models deal only with instructions and functional units from the software point of view and without electrical knowledge of the underlying architecture.
In this paper a precise approach for software energy estimation for ARM Cortex M4 processor is presented. The rest of the paper is organized as follows. Section II reviews the issues related to processor power measurement. In section III, experimental setup and result is presented. Validation of result is discussed in Section IV, and Section V concludes the paper.
II. ISSUES IN PROCESSOR POWER MEASUREMENT
The two main approaches in the estimation of the energy consumption of embedded system are: simulation based and measurement based. The simulation based approach uses models relating power consumption and programming instructions. Non availability of all models of a modern processor and if available, high price is the drawback of this approach. The second approach is based on physical measurements of power consumption. Measurement approach is the only way to verify correctness of simulation based approach. Therefore measurement is important to validate the power consumption models.
Measurement methods can be averaged or cycle accurate. It depends on the time interval of energy estimation. Method commonly used is measurement of voltage drop across shunt resistor inserted in the power supply line. The value of resistor so chosen should be very small so that its effect on total current is minimum. Since the value of voltage drop is very small, suitable voltage amplifier need to be used.
Digital Multimeter or data acquisition tools are used.
Current shunt method is at very low currents range like to measure current of MSP430 (microamperes) [1], wireless communication modules like Bluetooth (mill amperes) [2].
G. R. Udupi is with Computer Science and Engineering, SGBIT, Belgavi, India. (e-mail: [email protected]).
Instruction Level Energy Consumption Estimation of Embedded Processor
V. A. Kulkarni and G. R. Udupi
DMM is used to measure current consumption of 486 microprocessor [3]. Cycle accurate power consumption measurement is described in [4] where 0.1 Ohm shunt resistor along with differential amplifier is used. Drawbacks of current shunt method: i) shunt resistor presence in the direct supply line and affecting the total current ii) voltage drop across shunt resistor decreases voltage available for processor core, is overcome by using Wilson current mirror [5]. It duplicates the current taken by embedded system. Mirrored current is measured using shunt resistor. Current probes are used for microprocessor power estimation [6]. Its advantage is supply lines need not be cut to insert resistor. Due to high price of probes, current shunt method is preferred. Charge transfer method is used to find cycle accurate energy consumption [7]. However the measurement setup requires high sampling frequency DAQ, therefore is more complex to implement. Battery voltage drop after executing software is also measure of energy consumed [8]. The methods described above uses meters to build a power measurement. Another method is integrating power sensors into hardware architectures. This approach will give more accurate result as the sensors are integrated into the board. Issues related to instruction level power estimation are calculation of base cost, calculation of inter instruction cost, energy sensitive factors and method of processor current measurement [9]. In this paper base cost is measured by executing same instruction 1000 times in infinite loop. This will help to minimize the effect of branch instruction in loop and average current is considered. Effect of number of 1’s is very small and can be neglected [10]. All other factors are considered as certain percent of total energy. Onboard / integrated current measurement is used to find average current.
III. EXPERIMENTAL SETUP AND RESULTS
A. Experimental setup
ARM Cortex M4, on-board current measurement circuit is used which increases accuracy of measurements and overcomes many of limitations of current measurement mentioned in literature. It consists of a MAX9634T current monitor chip and a 12-bit ADC with a 12- bit sample at 50k to 200ksps. The MAX9634 multiplies the sense voltage by 25 to provide a voltage range suitable for the ADC to measure.
Onboard current measurement is used for energy calculation.
The ARM Cortex-M4 is a 32-bit core with 3 stage pipeline and Harvard architecture. Sample rate of 200ksps (5us period) is chosen for all measurements. Average current for a period of 1 second is considered for energy calculation. BL cost is Branch Loop cost. Each instruction takes few micro seconds for execution. It is very difficult to measure current in this short period. As reported in literature, method used is to run given instruction few thousand times in unconditional loop so that average current can be measured. To minimize the effect of branching instruction (BL), given instruction need to be executed several times before branching.
To find base cost, each instruction is executed 1000 times in a loop [11] [12] [13]. This minimizes the effect of “BL loop”
instruction on base cost. Calculation of inter instruction cost involves lot of measurements. Number of measurements is given by [n(n-1)/2]. Where ‘n’ is number of instructions in Instruction Set Architecture. For a microcontroller with 100 instructions, 4950 combinations of measurements to be carried out to find inter instruction cost. This large volume of measurement is tedious and time consuming. From
experiments it is found that except base cost, all other costs put together works out to be 5%. This 5% has been taken care in estimated energy. It will simplify the process of estimation to a great extent.
A typical embedded system software programs consist of two parts.1) Initialization part: It configures system modules, initializes program variables, etc. This initialization part of the program is executed only once at the beginning of the program so that the system gets ready to perform its mainoperation.2) Main part: It is usually implemented as an endless loop. From an energy consumption point, one can ignore the initialization part and assume that the system always operates in its main part. This is because when an embedded system is turned on, it is in the initialization phase for only few microseconds and then it goes into the main phase where it operates for hours. This implies that almost all the energy consumption of an embedded system is because of the main phase and not the initialization phase.
B. Instruction Energy Cost
Each assembly instruction is executed 1000 times in a loop to overcome the effect of branch instruction. This is shown in Figure 1. Sample rate of 200ksps (5us period) is chosen for all measurements. Average current for a period of 1 second is considered for energy calculation.
int main(void) { __asm(
“MOV R0,#0x00000001\n\r”
“MOV R1,R0 \n\r”
“MOV R2,#000000\n\r”
“loop: ADD r2, r0, r1 \n\r”
“ADD r2, r0, r1 \n\r”
“ADD r2, r0, r1 \n\r”
“ADD r2, r0, r1 \n\r”
|
|
|
|
“ADD r2, r0, r1 \n\r”
“BL loop”
);
}
Fig. 1. Assembly program for instruction cost.
The average current for assembly program shown in Figure 1 is found to be 3.217 mA. The core voltage is 3.3 Volt and frequency is 12MHz. Instruction energy is product of i) average current taken for instruction execution ii) core voltage iii) time required for each cycles and iv) number of cycles for instruction execution. The calculation for instruction cost for the instruction is shown in Figure 1 is given in Table I. Measurements are carried on all possible variants of different instructions of Cortex M4. The variants of instruction MOV along with their average current is given in Table II.
TABLEI:SAMPLE INSTRUCTION ENERGY CALCULATION
Average Current (mA)
Voltage (Volt)
Power (mW)
Time period
(1\f) (uS)
No. of cycles
Energy (nJ)
3.217 3.3 11.4279 0.0833 01 0.884321
TABLE II: Average current (mA) for MOV instruction
Instruction Average Current (mA)
mov r2, r1 2.606
mov r1,0xaaaaaaaa 3.282
mov.w r3, #3221225472 3.265
mov.w r8, r3, lsl #24 3.287
movcc r1, r5 2.819
movcc r0, #1 2.784
movcs r0, #0 2.75
movcs.w fp, #1 3.154
moveq r4, #0 2.741
moveq r5, r1 2.789
moveq.w r4, #512 3.104
movge r1, r5 2.959
movge.w r8, r6, asr #4 3.235
movge.w r0, #4294967295 3.411
movhi r0, #0 2.748
movls r0, #1 2.787
movlt r2, #34 2.8
movlt.w r8, #16 3.153
movne r4, #8 2.772
movne r4, r8 2.793
movpl.w r0, #4294967295 3.408
movs r3, #0 2.576
movs.w r9, r9, lsr #1 3.302
movt r1,0xaaaa 3.289
movw r3, #1022 3.289
IV. VALIDATION OF RESULTS
A. Benchmark used
Benchmarks used to validate the results are FDCT (Fast Discrete Cosine Transform), FIR (Finite impulse response filter), JFDCTINT (Discrete cosine transformation on 8x8 pixel block) , MATMULT (Matrix multiplication of two 20x20 matrices) from WCET and STRING SEARCH from MiBench[14] [15] [16]. The Mälardalen University, Sweden developed WCET Benchmarks. The Mälardalen WCET research group maintains a large number of WCET benchmark programs. Each benchmark is provided as a C source file (file.c).MiBench consists of a set of 35 embedded applications for benchmarking purposes. These benchmarks are divided into six suites with each suite targeting a specific area of the embedded market. The six categories are Automotive and Industrial Control, Consumer Devices, Office Automation, Networking, Security, and Telecommunications. All the programs are available as standard C source code. MiBench is composed of freely available source code.
B. Grouping of Instructions
Details about all assembly instructions executed are obtained from instruction trace. A total of 3680 instructions from ‘jfdctint’, 5116 instructions from ‘fdct’, 3033 instructions from ‘string search’, 37564 instructions from
‘matmult’ and 233380 instructions from ‘fir’ benchmark are traced. Energy for each instruction is calculated after knowing average current taken by it, as shown in Table I.
Estimated, actual energy consumption and percentage error for all benchmark is given in Table III.
TABLE III: % error for all benchmark – instruction wise
Benchmark % error
FIR 0.395418
JFDCTINT -0.5864
MATMULT 2.841366
STRING SEARCH 4.559685
FDCT -4.89428
C. Grouping of Instructions – function wise
ARM Cortex M4 instructions are broadly classified as:
General data processing instructions, Memory access instructions, Multiply and divide instructions, Saturating instructions , Packing and unpacking instructions, Bit field instructions, Branch and control instructions and Miscellaneous instructions. Variations of a given instruction are considered while assigning a base cost to the instruction.
Average current of each group of instruction is shown in Figure 2.
Fig. 2. Average current for each group.
Table IV shows percentage composition of instructions based on function for all five benchmark considered for validation. Once the grouping is done, benchmark programs executed. Instead of assigning a cost to type instruction, cost is assigned based on the group it belongs to.
TABLE IV: % Composition of instructions FIR JFDCTI
NT
MATM ULT
STRING SEARCH
FDCT Memory
access
63 60 14 70 54
Gen. data processing
26 33 71 20 34
Multiply &
divide
05 6 11 - 04
Branch &
Control
06 1 3 9 01
Packing &
Unpacking
- - 1 1 07
Once the grouping is done, benchmark programs executed.
Instead of assigning a cost to type instruction, cost is assigned based on the group it belongs to. The estimated and actual energy consumption for all benchmark considered using grouping method is shown in Table V.
TABLE V: % error for all benchmark – function wise
Benchmark % error
FIR 2.968463
0 1 2 3 4
Memory access Gen. data processing Multiply & divide Saturating Packing & Unpacking Bitfield Branch & Control Miscellenous
Current in mA
JFDCTINT 2.144767
MATMULT 11.16153
STRING SEARCH 3.188004
FDCT -0.87062
D. Grouping of Instructions – Cycle wise
Another method of grouping the instructions is based on number of cycles used for execution. Instructions are classified as 1 cycle 2 cycle and 3 cycle instructions. The average value taken for energy calculation for single cycle instruction is 3.111185 mA. The average value taken for energy calculation for two cycle instruction is 2.782308 mA and for three cycle instruction is 3.76 mA. The number of cycles for PUSH and POP depends upon number of registers to be pushed / popped. To simplify the calculations, PUSH and POP are considered as 2 cycle instructions. The percentage composition of cycle wise instructions in all five benchmark considered is shown in Table VI.
TABLE VI: % Composition of instructions
FIR JFDCTINT MATMULT STRING
SEARCH FDCT
1cycle 52 56 76 37 58
2 cycle 48 42 17 45 40
3 cycle - 03 07 18 02
Once the grouping is done, benchmark programs executed.
Instead of assigning a cost to instruction type, cost is assigned based on the number of cycles required for instruction execution. The estimated and actual energy consumption for all benchmark considered using grouping based on cycles is shown in Table VII.
TABLE VII: % error for all benchmark – cycle wise
Benchmark % error
FIR 5.698703
JFDCTINT 5.173162
MATMULT 17.46789
STRING SEARCH 17.03456
FDCT 1.417293
Experiments carried out by three methods for all five benchmark. In first method, each instruction is considered with its average current and number of cycles. In second method, instructions are grouped depending on their function.
In third method, instructions are grouped based on number of cycles required for execution.
The results obtained for all five benchmarks by three methods viz. instruction wise calculation, calculation based on grouping by function and calculation based on grouping by number of cycles required is shown in Table VIII.
TABLE VIII: % error for benchmark by three methods Benchmark Instruction
wise
Group by function
Group by cycles
FIR 0.395418 2.968463 5.698703
JFDCTINT -0.5864 2.144767 5.173162
MATMULT 2.841366 11.16153 17.46789
STRING SEARCH 4.559685 3.188004 17.03456
FDCT -4.89428 -0.87062 1.417293
As can be seen from Table VIII, the readings of instruction wise calculation, calculation based on grouping by function are nearly the same. Thus software energy calculation can be further simplified by considering function wise grouping of instructions, which saves lot of calculations and time.
Table IX shows the % difference between i) energy estimation by instruction and energy estimation by function wise grouping ii) energy estimation by instruction and energy estimation by cycle wise grouping.
TABLE IX: % error with instruction wise estimation
Benchmark Instruction &
Function grouping
Instruction & Cycle grouping
FIR -2.56291 -5.282397371
JFDCTINT -2.74728 -5.793533801
MATMULT -8.09028 -14.22241156
STRING SEARCH 1.311864 -11.9308627
FDCT -4.23072 -6.636375292
It is evident from Table IX that the difference between estimated energy of instruction wise and grouping of instructions based on function is less except in case of matmult benchmark. Whereas the difference between estimated energy of instruction wise and grouping of instructions based on cycle is quite high. Thus software energy consumption estimation can be carried out with reasonable accuracy by considering the instructions based on their functionality, which will reduce complexity in calculation to a great extent.
V. CONCLUSION
The two major areas where energy consumption can be minimized are hardware and software. Voltage scaling, frequency scaling and keeping component in power saving mode when not in use are methods used in energy minimization using hardware. This research concentrates on second area i.e. minimization of energy consumed by software. Major issues in software energy measurement are:
measurement of core current, inter instruction effect and static power. Measurement of core current has been carried out by using on board current measurement. Processor core are considered is ARM Cortex M4. Effect of inter instruction and static power is considered by taking them as certain percentage of estimated energy. This approach minimizes lot of calculation and saves time. Effort has been made to simplify the software energy estimation process by two methods. One by grouping instructions based on their functionality and second by grouping instructions based on number of cycles required for instruction execution.
Percentage error between estimated and actual energy is found to be from -4.89 to 4.55 when each instruction is considered separately with its energy cost. -0.87 to 3.18 (except MATMULT) when grouping is done based on function and 1.41 to 17.46 when grouping is done based on number of cycles. Thus software energy can be estimated by considering each instruction or by grouping the instructions based on function. The grouping will reduce the complexity involved in software energy estimation.
REFERENCES
[1] Cebrian A., Rey J., Tormos A., Millet J. “Adapting power consumption to performance requirements in a MSP430 microcontroller”, Spanish Conference on Electron Devices, 2005. PP. 83–86.
[2] Macii D., Negri L. “An Automatic Power Consumption Measurement Procedure for Bluetooth Modules”, Proceedings of the IEEE
Instrumentation and Measurement Technology Conference, 2006. PP.
1182–1187.
[3] V. Tiwari, S. Malik, and A. Wolfe, “Power analysis of embedded software: A first step towards software power minimization,” IEEE Trans. VLSI Systems, vol. 2, no. 4, pp. 437–445, Dec. 1994.
[4] Wolf F., Kruse J., Ernst R. “Timing and Power Measurement in Static Software Analysis”, Microelectronics Journal, 2002. Vol. 33. PP. 91–
100.
[5] Theodore Laopoulos, Periklis Neofotistos, C. A. Kosmatopoulos, and Spiridon Nikolaidis, “Measurement of Current Variations for the Estimation of Software-Related Power Consumption”, IEEE
TRANSACTIONS ON INSTRUMENTATION AND
MEASUREMENT, VOL. 52, NO. 4, AUGUST 2003, pp. 1206-1212.
[6] Bircher W. L., Valluri M., Law J., John L. K. “Runtime identification of microprocessor energy saving opportunities”, Proceedings of the International Symposium on Low Power Electronics and Design, 2005.
ISLPED '05. PP. 275–280.
[7] Naehyuck Chang, Kwanho Kim, and Hyung Gyu Lee, “Cycle-Accurate Energy Measurement and Characterization with a Case Study of the ARM7TDMI”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 2, APRIL 2002, pp. 146-154.
[8] Krintz C., Ye Wen Wolski R. “Application-level prediction of battery dissipation”, Proceedings of the International Symposium on Low Power Electronics and Design, 2004. ISLPED '04. PP. 224–229.
[9] V.A.Kulkarni and G.R.Udupi, “ Instruction Level Power Consumption Estimation – Issues and Review”, Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 2458-9403 Vol.
4 Issue 2, February – 2017, pp 6776-6781.
[10] V.A.Kulkarni and Dr. G.R.Udupi, “A Simplified Software Energy Consumption Estimation for Embedded System”, Journal of Embedded Systems, 2017, Vol. 4, No. 1, 7-12.
[11] H. Blume , D. Becker, L. Rotenberg , M. Botteck ,J. Brakensiek , T.G.
Noll, “Hybrid functional and instruction level power modeling for embedded and heterogeneous processor architectures”, Journal of Systems Architecture 53 (2007), pp. 689–702.
[12] Vasilios Konstantakos, Alexander Chatzigeorgiou, Spiridon Nikolaidis, and Theodore Laopoulos, “Energy Consumption Estimation in Embedded Systems”, IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 57, NO. 4, APRIL 2008, pp. 797-804.
[13] Mostafa E. A. Ibrahim, Markus Rupp and Hossam A. H. Fahmy, “ Precise High-Level Power Consumption Model for Embedded Systems Software”, Hindawi Publishing Corporation, EURASIP Journal on Embedded Systems, Volume 2011, Article ID 480805, 14 pages, doi:10.1155/2011/480805.
[14] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload Characterization, Dec. 2001, pp. 3–14.
[15] WCET benchmark [online]. Available
http://www.mrtc.mdh.se/projects/wcet/ benchmarks.html
[16] Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper.
“The Mälardalen WCET benchmarks – past, present and future”, In Björn Lisper, editor, Proc. 10th International Workshop on Worst-Case Execution Time Analysis (WCET’2010), pages 137–147, Brussels, Belgium, July 2010.
[17] V.A.Kulkarni and G.R.Udupi, “A Simplified Method for Instruction Level Energy Estimation for Embedded System”, EJERS, European Journal of Engineering Research and Science, Vol. 2, No. 5, May 2017.
DOI: http://dx.doi.org/10.24018/ejers.2017.2.5.359.