• No results found

MR Butterfly—A fast Fourier Transform Algorithm Based on MapReduce

N/A
N/A
Protected

Academic year: 2020

Share "MR Butterfly—A fast Fourier Transform Algorithm Based on MapReduce"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

2018 International Conference on Computer, Communication and Network Technology (CCNT 2018) ISBN: 978-1-60595-561-2

MR-Butterfly—A fast Fourier Transform Algorithm Based on MapReduce

Yu YU

Institute of Ocean Instruments and Metrology, Qilu University of Technology, Shandong Academy of Sciences, Qingdao 266100, China

Keywords: MapReduce model, FFT, Cooley-Tukey algorithm.

Abstract. Based on the research and analysis of the butterfly computing structure of the Cooley-Tukey algorithm, a fast Fourier transform algorithm MR-Butterfly for large data is proposed in this paper. The algorithm makes full use of the fast Fourier leaf-changing butterfly calculation structure, which can be used to determine the characteristics of the Cooley-Tukey butterfly calculation unit for the complex multiplication and complex addition operations in batch processing. The MR-Butterfly algorithm does not need to deal with synchronization and communication. The algorithm has the advantages of simple structure, robustness, universality and extensibility. The experimental part verifies the validity of the algorithm by using the test data with large data volume.

Introduction

Fast Fourier transform has an extremely important position in the field of scientific computing. In [1], Tega bit level integer multiplication is fulfilled using the Strassen algorithm and MapReduce model. The Strassen algorithm is an integer multiplication algorithm based on Fast Fourier Transform (FFT). In [2], Discrete Fourier Transform is used to reduce dimensions of large-scale time series data, based on which a similarity search method suitable for large-scale time series databases is proposed.

Cooley-Tukey algorithm is first proposed in [3], which has become the most commonly used FFT algorithm. In [4], performance analysis of Cooley-Tukey algorithm based on IBM Cyclops Multi-core architecture is done using PAR-R2-FFT parallel base-2 time extraction Fourier transform algorithm.

For N-point sequences FFT, PAR-R2-FFT algorithm is composed of log2N levels computing

operations. The butterfly unit is evenly distributed to the computing unit for concurrent processing. In the calculation process, the pre-stored butterfly operation's shuffling and butterfly operation coefficients are used to achieve the sharing of the intermediate calculation results of the algorithm through the global shared memory. In [5], the proposed Fast Fourier Transform algorithm consists of three steps. First, inverting the code sequences of the data sequences. Then, dividing the data sequence into p blocks. For each block, the N/p data items can sequentially input into processing units PE[0] to PE[p-1]. P is the total number of processing units. N is the length of data sequence for Fast Fourier Transform. Step two and three perform the actual Fast Fourier Transform, step two corresponds to the sequential execution part of the Fast Fourier Transform process, and step three corresponds to the parallel execution part of the Fast Fourier Transform. The sequential execution

section refers to the stages starting from the log (2 N/ )p stages of the N-point Fast Fourier Transform.

The sequential execution section refers to the stages starting from the N-point Fast Fourier Transform. The Fast Fourier Transform of the N/p partial data is executed in each execution unit. The number of N/p partial data is stored in the local memory of each execution unit. The parallel execution section

represents the remaining log ( )2 p stages in the N-point Fast Fourier Transform. The index distance

(2)

MR-Butterfly

MR-Butterfly algorithm is based on cloud computing MapReduce model. The parallelization of Cooley-Tukey butterfly operation makes the MR-Butterfly algorithm suitable for fast discrete Fourier transform of massive data. The core idea of the MR-Butterfly algorithm is to make full use of the characteristics of the fast Fourier transform butterfly computation structure that can be determined in advance and performs centralized batch processing of complex multiplication and complex addition in each Cooley-Tukey butterfly computation unit. Compared with the traditional parallelized fast Fourier transform algorithm, the advantage of the MR-Butterfly algorithm is that it does not need to deal with synchronization and communication. The algorithm has a simple structure, has good robustness and scalability, and has good universality for amount of data.

The core of the MR-Butterfly algorithm is the calculation of the butterfly shuffle coefficient r

N S and

the calculation of the butterfly coefficient r

N

W . The following are detailed descriptions of r

N

S and r

N W .

Calculation of r N S

In the calculation of the MR-Butterfly algorithm, the calculation can be abstracted as a solution function:

[2] ( , , )

R =SNR i j N (1)

The arrayRis used to store the result of the transformation, which are keyi and 2j

ij i+a × , i

indicates the input number.j indicates the current stage.Nstands for the length of the sequence.

Solving the functionSNR i j N( , , )directly can save the storage space of matrix

2

log

N N

T

× and more suitable

for parallel program. In MR-Butterfly algorithm, the direct calculation of SNR i j N( , , )is used.

2

log N

log2N

[image:2.612.138.477.385.561.2]

(a) (b)

Figure 1. Matrix division.

As depicted in figure 1-(a), each column of matrix TN×log2Nis divided into two areas. The black

region has a transform matrix coefficient of -1. Therefore, the solution of function SNR i j N( , , )is

equivalent to determining whether the input of the j levels i is in the black area of the jcolumn of

the matrixTN×log2N.

The specific steps for solving function SNR i j N( , , ) directly are as follows:

a. In the case where j=0 , if i is an even number, then R[0]=i , R[1]= +i 1 . Otherwise,

[0] 1

R = −i ,R[1]=i, then returnR.

b. Let 1

2j

m= + ,k=[ /i m], calculate the relative position rp.

If i≤ ×k m,rp= ×k m− −i 1, then rp=(k+ ×1) m− −i 1.

c. Calculation ofR

If rp<0, then R[0]=i, [1] 2j

(3)

If 2j 1

rp , [0] 2j

R = −i ,R[1]=i,then returnR. Otherwise, R[0]=i, [1] 2j

R = +i , returnR.

Calculation of r N W

In the base two time extraction of series with Cooley-Tukey algorithm, r

N

W the corresponding

transformation matrix of the butterfly operation structure is as follows:

2

l o g

0 0 0

0 0 0

0 0 0

0 2 0

0 0 0

0 0 1

0 0 2

0 2 3

[image:3.612.200.416.144.302.2]

N N M ×               =            

Figure 2. Matrix

r N W

.

In the calculation of the MR-Butterfly algorithm, the calculation can be abstracted as a function:

[2] ( , , )

C =WNR i j N

(2)

Array Nis used to store the transformed result r

N

W and r N W

− , rindicates the relative position of the

input serial numberi, jrepresents the current level, Nrepresents the length of the sequences.

The calculation of r

N

W is the same as r

N

S . Matrix

2

log

N N

M

× is divided as in Figure 1-(b). The specific

solution steps of function WNR i j N( , , ) are as follows:

a. If i is an even number with j=0, then C[0] 1= ,C[1] 1= . Otherwise, C[0] 1= ,C[1]= −1, returnC.

b. Let 1

2j

m= + ,k=[ /i m], calculate relative positionrp.

If i≤ ×k m,rp= ×k m− −i 1, then rp=(k+ ×1) m− −i 1.

c. Calculation of C

If rp<0, then [0] 1C = , [1] 1C = , return C.

If 2j 1

rp , then let

2 log

w= N, 2j

l= , 1

2w j ( 1)

rV − − l rp

= × − − . Otherwise

cos( 2 / )

real= − ×PI×rV N , im=sin( 2− ×PI×rV/N)

[0]

C =real+im×i , [1]C = −realim×i , with i = −1, return C.

If 2j 1

rp> , [0] 1R = , [1] 1R = , return C.

From the above description, it can be seen that each computing unit can independently perform the

calculation of r N S and r N W

under the condition of known parameters( , ,i j N), so that the MR-Butterfly

algorithm supports parallelization processing.

Experiments

In the experiment part, two sets of test data are used to verify the MR-Butterfly algorithm. The number of test data are: 217,218,219,220,221,222,223,224.

The experimental environment is: 3 PCs, Intel dual-core 2.4GHZ processor, 2GB memory and 250GB hard disk, operating system Ubuntu 12.04 LTS; Hadoop installation package, version 1.1.2.

For real fast Fourier transforms, the speed can be normalized to the representation of FLOPS (floating point operations). The definition of FLOPS is as follows:

2

2.5Nlog N

FLOPS

FFT

=

(4)

FFT

∆ represents the time in seconds for performing a Fast Fourier Transform.

17 18 19 20 21 22 23 24

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10

4

C

o

m

p

u

ta

ti

o

n

T

im

e

(s

)

Log2(Problem Size)

17 18 19 20 21 22 23 24

0 100 200 300 400 500 600 700

C

o

m

p

u

ta

ti

o

n

T

im

e

(s

)

Log2(Problem Size) Combine

Add Sort Shuffle

[image:4.612.124.484.83.244.2]

(a) (b)

Figure 3. Operation execution time.

Figure 3-(a) shows the relationship between the problem size and the execution time of the MR-Butterfly algorithm. The execution time of the MR-Butterfly algorithm increases rapidly with the problem size. As shown in Figure. 3-(b), the execution time of all partial operation of the algorithm increases with the increase of the problem size. However, the growth rate of the sort execution time curve is obviously higher than that of other curves. It can be inferred that the sort operation is the main bottleneck of the MR-Butterfly algorithm. In future studies, the MR-Butterfly algorithm will be validated in large-scale clusters.

Conclusion

In this paper, based on the research and analysis of the butterfly calculation structure of the Cooley-Tukey algorithm extracted from the base 2 time, a fast Fourier transform algorithm MR-Butterfly suitable for big data is proposed. The algorithm makes full use of the fast Fourier. The leaf transform butterfly computation structure can be determined in advance, and the complex multiplication operations and complex addition operations in each Cooley-Tukey butterfly computation unit are processed in batches. MR-Butterfly algorithm does not need to deal with synchronization and communication, the algorithm structure is simple, robust, universal and scalable, the experimental part of the test data using a large amount of data to verify the algorithm.

Acknowledgement

This work is supported by the National Key R&D Program of China under the Grant 2017YFC1405600, National Science Foundation for Young Scientists of China under the Grant 41706101, Qingdao City Southern District Science and Technology Development Fund under the Grant 2016-2-012-ZH.

References

[1]Sze, T W. Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers[C]

//Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation. ACM, 2012: 54-62.

[2]Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases[J].

Foundations of data organization and algorithms, 1993: 69-84.

[3]Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series[J].

(5)

[4]Chen L, Gao G R. Performance analysis of cooley-tukey fft algorithms for a many-core architecture[C]//Proceedings of the 2010 Spring Simulation Multiconference. Society for Computer Simulation International, 2010: 81.

[5]Bahn J H, Yang J, Bagherzadeh N. Parallel FFT algorithms on network-on-chips[C]//Information

Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on. IEEE, 2008: 1087-1093.

[6]Aarnio T. Parallel data processing with MapReduce[C]//TKK T-110.5190, Seminar on

Figure

Figure 1. Matrix division.
Figure 2. Matrix
Figure 3. Operation execution time.

References

Related documents

The national health priority areas are disease prevention, mitigation and control; health education, promotion, environmental health and nutrition; governance, coord-

For evaluat- ing the impact of dust depositions on the snowpack dynam- ics, key variables (e.g. albedo, snow depth, and snow wa- ter equivalent) measured from the AWS were compared

Based on the idea, we have put forward novel routing strategies for Barrat- Barthelemy- Vespignani (BBV) weighted network. By defining the weight of edges as

The paper assessed the challenges facing the successful operations of Public Procurement Act 2007 and the result showed that the size and complexity of public procurement,

Aim: The goal of our study was to verify the correlation of the aforementioned ARMS2 variation with the disease, to examine, for the first time, the role of the CD14 C260T

Efficiency factors found from Bolomey‟s strength equation are used to describe the effect of the GGBS combination replacement in concrete in the enhancement of strength and

A naming authority is a unique string, assigned by CNRI, which identifies a local handle service - LHS (and handles) to the ‘ Global Handle Registry’.. This will be used as the

CCR6: chemokine (C-C Motif) receptor 6; CFH: complement factor H; CTLA-4: cytotoxic T-lymphocyte antigen 4; CpG: cytidine-phosphate-guanosine; EAAU: experimental