Research on the Relationship of Approximate Computing Application Precision and Hardware Resource Utilization

(1)

2018 International Conference on Computer, Electronic Information and Communications (CEIC 2018) ISBN: 978-1-60595-557-5

Research on the Relationship of Approximate Computing Application

Precision and Hardware Resource Utilization

Feng-yu GUO and Xiao-ying WANG

*

Department of Computer Technology and Applications, Qinghai University, Xining, Qinghai, China, 810016

*Corresponding author

Keywords: Approximate computing, Loop perforation, Approximate application, Resource utilization.

Abstract. The rapid development of data center brings huge energy consumption problems. In order to solve this problem, we intend to use renewable energy such as solar energy to power the datacenters. When using solar energy to power the data centers, as the solar energy changes by climate, seasons, weather conditions and other factors, its power is a kind of relatively unstable energy, in order to adapt to this kind of unstable energy, datacenters at runtime need to regulate their load to adapt to energy changes. We adjust the load on server by trading off task precision with server workload. This paper focuses on the approximate computing applications, using loop perforation technique for the application of precision adjustment, and tests the hardware resource usage of the application under current precision. The change of the average hardware resource utilization as the application precision changes is also analyzed.

Introduction

The normal application will return a reliable and accurate result to meet the requirements of the user. However, in some areas such as search engine and video applications, it is not very necessary to get an accurate result for the users. When users are using search engines to search, it is feasible to reduce the workload of the search engine and return a relatively satisfactory result to the user. Similarly, for video applications, the occasionally dropped frames will not be observed by users, but can save a lot of energy. Therefore, the adjustment of the accuracy of the application of video will be of great significance for energy conservation and emission reduction under the premise of satisfying the user's demand.

Related Work

(2)

approximate variables in the application, and analyze the importance of the variable to the output result. Sidiroglou-douskos S et al.[7] analyzed the procedures of parsec-3.0 benchmark suit using the loop perforation technique, studied the program segments that could perform the loop perforation operation, and quantified the accuracy of the program. Zhong L[8] conducted a study on the inherent fault tolerance of the approximate calculation program, and selected several typical applications to identify the approximate parts in these programs. Nair R[9] proposed the approximate calculation methods of hardware design, data collection and data processing for some low-power embedded devices, so as to ensure the service quality and reduce the running power consumption to prolong the working time of the equipment. Moons B[10] et al. proposes an approximate computing method for convolutional neural network which can reduce the power consumption of the program by 30 times, while the accuracy is reduced to 99%. Yazdanbakhsh A[11] et al. provided A set of benchmark suit for approximate computing research, which is called axbench.

Introduction of Loop Perforation

Loop perforation is a common way to adjust the accuracy of the application, reducing the accuracy of the program's running results by running a subset of the initial calculation amount of the original application .The mechanism of loop perforation is described as follow:

The common loop in the application is like this:

For (int i=0; i<n; i++) { body;

};

The loop which is modified by loop perforation is:

for( int i=0; i<n; i=i+t ){

Body; }

for( int i=0; i< n ; i++ ){ if ( i% t == 0 ){

Body; } }

The initial value of the loop control variablei in the first program is 0, and the loop will run n times in total. The loop control variable i in the second program is 0, but the program runs n/t times, and the

t is the loop jump variable. By setting different t values, the loop can skip some calculation process and only run a subset of the calculation amount of the original program, which makes the program's accuracy drop.

Application Selection and Mechanism analysis

Parsec3.0

In this paper we select bodytrack, ferret and canneal from parsec3.0 benchmark suit as approximate computing application. The input and output data analyze of these three applications are as below:

Bodytrack: this is a computer-vision app, using an annealed particle filter to track the movement of the human body. The program produces two files representing the body position vector data files and corresponding BMP image files representing the human actions in the image. In this paper, the accuracy of the application is quantified by using the body position vector data file produced by the application under different precision conditions.

(3)

Ferret: this is an image search engine application that analyzes the content of an input image and finds the 10 most similar images in a given image database. The input of this program is the query image and the given image database. The result is a list of number that represents 10 images similar to a given image in the database. In this article, we used the number of ten similar images returned by the search engine to quantify the accuracy of the program.

Ferret is an image search engine. After entering a picture, ferret needs to analyze the contents of the image. According to the analysis results, the search energy will search the query image in a given image database and return the 10 images closest to the input image. The Ferret program firstly divides the input images, extracts the feature vectors of the segmented image blocks, and compares the feature vectors of the extracted feature vectors with the image blocks in the database to determine the similarity of the two images. When comparing a given image with a picture in a database, some candidate images need to be selected from the database. The LSH_query_bootstrap() function is used to determine the candidate images in the image database. When the function is used for loop perforation, the images in the partial image database will be omitted. LSH_query_merge () is used to arrange the array of images that have been checked out[7] .

Canneal: this program uses simulated annealing algorithm to minimize routing cost in the chip design. The input of the application is the synthetic netlist of chip, and the program outputs a number that represents the total routing cost of the chip. This paper uses this number to quantify the program precision.

Canneal applications use a simulated annealing algorithm to minimize the routing cost in chip design. The input data is a chip design route netlist, and canneal uses this routing information for computing the total routing cost. The run() method is used to calculate the routing cost and determine whether the optimal solution has been obtained for the routing cost. The reload() method is used to construct a new state variable using the Mersenne twiste random number generator[7] [12] .

Accuracy Quantization

After the application is modified by the loop perforation, the accuracy of the application will change, we use the relative error model to quantify the accuracy of the application, which is based on the output of the original program, and the difference between the output after the loop perforation and the original program output. The application precision model is as follows:

accuracy = 1 −

∑

₍₁₎

wherein is the output of the ith running result, ! is the output of original application, considering that an application requires multiple tests to get a more accurate value, you need to average the results of the i results.

We performed loop perforation operation on cycles, and recorded the running time of the modified program, and the experimental results were shown in Table 1.

Table 1. Application runtime and speedup.

Bodytrack Ferret Cannel

i=i+t Runtime Speedup Runtime Speedup Runtime Speedup i=i+1 1.683s 0 8.0742s 0 7.195s 0 i=i+2 1.336s 1.26 6.535s 1.23 6.23s 1.15 i=i+3 1.401s 1.20 5.857s 1.37 5.641s 1.27 i=i+4 0.992s 1.7 4.164s 2.04 5.482s 1.31

i=i+5 0.96 1.75 _5.311s 1.35

(4)

application whick has been modified by loop perforation. *../#012&*.3 is the accuracy of current application. The precision quantization formula is shown in Eq.(2).

acc456789:= 1 −

;<;<_=>?

;<=>? (2)

[image:4.612.113.493.175.255.2]

The precision of the bodytrack program in different loop jump variable t values is shown in Table 2.

Table 2. Bodytrack precision quantization.

i=i+t Relative error accuracy

i=i+1 0 100%

i=i+2 0.182138 81.78%

i=i+3 0.250038 74.99%

i=i+4 0.317624 68.23%

i=i+5 0.3772z85 62.27%

We selected the value of loop jump variable t from 1 to 4. When t reaches 5, the drop in accuracy is too severe, so the range of t is set from 1 to 4. We performed loop perforation operation on these loops, and recorded the running time of the modified program, and the experimental results were shown in Table1.

According to the above research on ferret program, the output file of ferret program is 10 picture numbers with the highest similarity to the given image in a given database. Therefore, after modified by loop perforation, comparing with the original output, ferret will return 10 different image numbers. So consider the following steps to quantify the accuracy of the ferret:

1) input a query image and run the original ferret, record the 10 returned image numbers, use these numbers to build a set of original output @#&'(')*+_' .

2) use loop perforation to modify ferret, and input the same image to current ferret, record the new 10 returned image numbers, use these numbers to build a set of modified output A_'+"

Defining a new collectionB_'%&&#&, the values ofB_'%&&#&will be obtained by Eq.(3):

C<777= A7E8F ∩ BF (3) 3)calculate the accuracy of the ferret program by using the relative error formula as Eq.(4) shows

accI<77<= JK

[image:4.612.119.495.560.627.2]

L=>? (4) The above steps are the process of quantifying ferret accuracy. According to the above steps, we use 256 images as input, the experimental data obtained are shown in Table 3.

Table 3. Ferret precision quantization.

i=i+n MNOPQPRST _UTV _WXOONO _accuracy

i=i+1 2560 2560 0 100%

i=i+2 2560 2560 931 35%

i=i+3 2560 2560 663 25%

i=i+4 2560 2560 316 11%

We selected the value of loop jump variable t from 1 to 5, we performed loop perforation on these loops, and recorded the running time of the modified program, and the experimental results were shown in Table 1.

According to the above research on canneal, the output file is a number which represents the total routing cost of chip, so we use the routing cost to quantify the accuracy of canneal. The relative error formula for canneal is as Eq (5):

(5)

The experimental data obtained are shown in Table 4.

Table 4. Canneal precision quantization.

i=i+t Routing cost Relative error accuracy

i=i+1 690232000 0 100%

i=i+2 715820000 0.037071593 96.29284% i=i+3 728136000 0.054914869 94.50851% i=i+4 735545000 0.065648941 93.43511% I=i+5 740513000 0.072846521 93.31535%

Application Accuracy and Hardware Resource Usage Modeling

The server's power consumption is closely related to the server's hardware resource utilization, so we try to build the model of application precision and server hardware resource usage.

Our experiment platform information is shown as Table 5.

Table 5. Experiment platform.

CPU memory I/O Linux version

Intel-i5-4210m 3.2GHz 1GB 528MB/s CentOS Linux release 7.2.1511 (Core)

We first modify the application by loop perforation, and run the application on the platform, and record the CPU usage and memory usage, since the hardware resource utilization of the server is in dynamic change, we conducted five experiments on one application and averaged the hardware resource utilization of the five experiments, and obtained relatively stable and accurate utilization of hardware resources.

[image:5.612.91.523.414.503.2]

When the precision of application changes, the change of hardware resource utilization is shown in the Table 6.

Table 6. Application runtime and hardware usage.

Bodytrack real usage Ferret real usage Canneal real usage

i=i+t CPU memory runtime CPU memory runtime CPU memory runtime i=i+1 82.8% 47.4% 1.683s 91.4% 49.8% 8.0742s 99.2% 53.4% 7.195s i=i+2 68% 47% 1.336s 90.2% 48.6% 6.535s 97.2% 53% 6.23s i=i+3 64% 46.6% 1.401s 88.8% 47.2% 5.857s 97.6% 51.8% 5.641s i=i+4 63.4% 45.8% 0.992s 86.8% 46.2% 4.164s 96.8% 51.4% 5.482s i=i+5 51.4% 45.2% 0.96s 97.6% 50.2% 5.311s

By observing these data, we can find that the CPU usage and memory usage are changed with the change of application accuracy. bodytrack CPU utilization changes rapidly but ferret and canneal

CPU utilization change less. The memory usage of these three applications all change less, so we consider to use the runtime to calculate the average hardware usage, the average hardware usage is as shown in Eq.(6).

usage

\\\\\\\\=;8E<K?] ∗_<K?]

_<`?a (6)

whereinb$*(%&%*+2 is the real hardware usage of application when the loop jump variables is

t, 2'c%_&%*+2 is the real runtime of application when the loop jump variable is t, 2'c%c*dis the runtime of original application which is not be modified by loop perforation.b$*(%\\\\\\\\2_{is the average hardware} usage of application.

(6)

[image:6.612.76.535.85.166.2]

Table 7. Application average hardware usage.

Bodytrack average usage Ferret average usage Canneal average usage

i=i+t CPU memory runtime CPU memory runtime CPU memory runtime i=i+1 82.8% 47.4% 100% 91.40 % 49.80 % 100% 99.2% 53.4% 100% i=i+2 53.97% 37.30% 81.78% 73.01 % 39.34 % 35% 84.16% 45.89% 96.29284% i=i+3 53.27% 38.71% 74.99% 64.42 % 34.24 % 25% 76.52% 40.61% 94.50851% i=i+4 37.36% 26.92% 68.23% 44.76 % 23.83 % 11% 73.75% 39.16% 93.43511% i=i+5 29.31% 25.71% 62.27% 72.04% 37.05% 93.31535%

We can observe that the CPU usage and memory usage of each application changed clearly, we use the CPU average usage as output and the accuracy as input, using Multiple linear regression method to build the model of application accuracy and CPU average usage. The models of each application are as follows. Eq. (7)-(9) are the models of bodytrack, ferret and canneal respectively, reflecting the relationship of the average CPU utilization and the application accuracy. Similarly, Eq. (10)-(12) are the models for the three applications reflecting the relationship of the average CPU utilization and the application accuracy.

y = 16.2050 ∗ xk− 39.1970 ∗ xo+ 32.4493 ∗ x − 8.6305 (7)

y = 0.5705 ∗ xk− 1.378 ∗ xo+ 1.182 ∗ x + 0.1243 (8)

y = 1.69 ∗ xk− 3.472 ∗ xo+ 2.482 ∗ x + 0.2144 (9)

y = 1.8834 ∗ xk− 5.2544 ∗ xo+ 5.3312 ∗ x − 81.4873 (10)

y = 0.5705 ∗ xk− 1.378 ∗ xo+ 1.182 ∗ x + 0.1243 (11)

y = −356.2 ∗ xk+ 1025 ∗ xo− 980.7 ∗ x + 312.3 (12)

Modeling Accuracy Verification

[image:6.612.126.489.499.588.2]

The loop perforation test is performed once again for each program. Each program is tested twice, and the test results are quantified by the precision quantization method, and the average CPU utilization and memory average utilization rate of the current accuracy are tested. Tested data are shown as below:

Table 8. Verification tested data.

Application Accuracy CPU average usage Memory average usage

Bodytrack 90% 66.8% 45.8%

77% 56.0% 39.6%

Ferret 43% 90% 50%

37% 80% 50%

Canneal 88.75% 90% 45.2%

88.68% 75% 44.8%

These data are all real experimental data. We use the accuracy of each application as input, use the corresponding model to calculate the average CPU usage and memory usage of each application, and compare the output of model and real experimental data to verify the accuracy of model. The comparison results are shown below:

Table 9. Model verification data.

Application Accuracy CPU real data CPU model output Memory real data Memory model output

Bodytrack 90% 66.8% 69.06% 45.8% 42.96% 77% 56% 51.42% 39.6% 36.24%

Ferret 43% 89% 77.39% 45.8% 42.83%

37% 81% 74.29% 43% 40.26%

[image:6.612.79.531.666.747.2]

(7)

Through the above data, we can see that the modeling effect is good, and the relative error of all models is within 10%.

Summary

In this paper, three applications, bodytrack, ferret and canneal, were selected as the research target. We analyzed their mechanisms respectively, and identified their loop segment which can be modified by loop perforation. We use loop perforation technique to modify these loop segment in each application, and record their output. Then, we use the relative error model to quantify the accuracy of each application. We also test the hardware resource usage of each application at different accuracy. Furthermore, we modeled the relationship between application accuracy and hardware resource usage and verified the modeling results. The verification results show that the accuracy of these models is mostly within 10%, and the modeling effect is good.

Acknowledgements

This paper is partially supported by The National Natural Science Foundation of China (No. 61762074, No.61563044 and No. 61640206), Open Research Fund Program of State key Laboratory of Hydro science and Engineering (No.sklhse-2017-A-05), and National Natural Science Foundation of Qinghai Province (No. 2015-ZJ-725).

References

[1] Han J, Orshansky M. Approximate computing: An emerging paradigm for

energy-efficientdesign[C]//Test Symposium. IEEE, 2013:1-6.

[2] Shafique M, Shafique M, Shafique M, et al. Embracing approximate computing for energy-efficientmotion estimation in high efficiency video coding[C]//Conference on Design, Automation & Test inEurope. European Design and Automation Association, 2017:1388-1393.

[3] Chippa V K, Chakradhar S T, Roy K, et al. Analysis and characterization of inherent application resilience for approximate computing[C]//Design Automation Conference. IEEE, 2013:113.

[4] Zhang Q, Yuan F, Ye R, et al. ApproxIt:An Approximate Computing Framework for Iterative Methods[C]// Design Automation Conference. IEEE, 2014:1-6.

[5] Roy P, Ray R, Wang C, et al. ASAC:automatic sensitivity analysis for approximate computing[J].Acm Sigplan Notices, 2014, 49(5):95-104.

[6] Raha A, Venkataramani S, Raghunathan V, et al. Quality configurable reduce-and-rank for energy efficient approximate computing[C]//Design, Automation & Test in Europe Conference &Exhibition. EDA Consortium, 2015:665-670.

[7] Sidiroglou-Douskos S, Misailovic S, Hoffmann H, et al. Managing performance vs. accuracytrade-offs with loop perforation[C]//Sigsoft/fse'11, ACM Sigsoft Symposium on the Foundations ofSoftware Engineering. DBLP, 2011:124-134.

[8] Zhong L. Broad. bold and reliable online approximate computing framework for diverseapplications[J]. 2015.

[9] Nair R. Models for energy-efficient approximate computing[C]//ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 2010:359-360.

(8)

[11] Yazdanbakhsh A, Mahajan D, Lotfi-Kamran P, et al. AXBENCH: A Multi-Platform BenchmarkSuite for Approximate Computing[J]. IEEE Design & Test, 2017, PP(99):1-1.