• No results found

A novel bit serial architecture based K best decoder for MIMO detection

N/A
N/A
Protected

Academic year: 2020

Share "A novel bit serial architecture based K best decoder for MIMO detection"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

NOVEL BIT SERIAL ARCHITECTURE BASED K-BEST DECODER

FOR MIMO DETECTION

Shirly Edward A1. and Malarvizhi S2.

1Department of Electronics & Communication Engineering, SRM University, Vadapalani, Chennai, India 2Department of Electronics & Communication Engineering, SRM University, Kattankulathur, Chennai, India

E-Mail: [email protected]

ABSTRACT

Multiple-Input Multiple-Output (MIMO) systems are widely studied and included in some wireless communication standards in order to achieve tremendous gains in wireless system capacity and link reliability. At the receiver end, Maximum Likelihood (ML) detection has a superior performance but its VLSI implementation is infeasible due to its complexity. Therefore, a modified k-best detection algorithm is proposed and its architectural implementation using Distributed Arithmetic algorithm which is bit serial in nature is presented in this paper. The hardware implementation is targeted to Xilinx Virtex FPGA and the results are tabulated. The comparison of resource utilization with literature shows reduction in hardware complexity to a greater extent. Our design achieves a decoding throughput of 39Mbps.

Keywords: Modified K-best algorithm, maximum likelihood, FPGA, VLSI implementation, throughput.

INTRODUCTION

Modern wireless communication systems depend on signal processing and VLSI technology. The main challenge for the researchers is to provide new architectures and implementation methods for the increasing higher data rate and to offer better quality of service. Currently Multiple-Input Multiple-Output (MIMO) systems are deployed to meet this high performance requirements in wireless communications. MIMO systems can improve the system capacity and communication reliability [1]-[2]. Wireless communication standards like IEEE 802.11n (Wi-Fi), IEEE 802.16m (WiMax) and 3GPP-LTE use MIMO technology to increase the data rate to hundreds of Mega bits per second(Mbps). Unfortunately, the receiver side of the system has a considerable hardware complexity. Therefore, optimal detection algorithms with low hardware complexity are needed.

Maximum Likelihood (ML) detector is considered as the optimal detector for spatially multiplexed MIMO systems, but its exponential complexity makes it unrealizable. In order to overcome the computational complexity of ML detector, tree search based algorithms called sphere decoders are proposed[3-5].Three different approaches for SD have been studied and implemented, they are Depth-first search algorithm[6], Breadth first search algorithm [7] and Best first search algorithm. The Schnorr-Euchner (SE) sphere decoder which follows depth-first search achieves ML performance [8] but offers intrinsically variable throughput which decreases at low signal-to-noise ratio. Breadth first search algorithms are preferred because they achieve near-ML solution with constant throughput and exhibits higher parallelism.

In this paper, we present the implementation of modified K-best algorithm for 2x2 MIMO system with 4QAM and 16QAM modulation using Distributed Arithmetic (DA). DA is an efficient technique for calculation of sum of products or multiply-accumulate

operations and it is also bit serial in nature. The proposed architecture operates in parallel with near-optimal performance and lower hardware complexity compared to the implementations in the literature.

The rest of this paper is organized as follows. In Section II, the MIMO system model, ML detection and sphere decoder are discussed. Bit serial architecture is discussed in Section III. Section IV presents the proposed sphere decoder architecture for the modified K-best algorithm using distributed arithmetic algorithm. In Section V, the hardware implementation results were listed and compared with the previous works. Section VI concludes the paper.

MIMO SYSTEM MODEL

System Model

We consider a NxM complex-valued baseband MIMO system with N transmit and M receive antennas. The equivalent complex-valued MIMO channel between the transmitter and receiver is described by an M x N dimensional channel matrix H. The M dimensional received signal vector is given by

y

Hs

n

(1)

Where

[

1 2

.... ]

T N

s

s s

s

represents the N dimensional transmit signal vector and n stands for the M dimensional noise vector with independent and complex zero-mean Gaussian elements with equal power σ2 for both real and imaginary parts. The transmitted symbols are chosen independently from a set

of complex-valued constellation points with Q bits per scalar symbol, i.e.,

| | 2

Q

(2)

ML Detection and Sphere Decoders

ML criterion for estimating the transmitted symbol s from the received signal vector y, assuming perfect knowledge of the channel matrix H at the receiver, is given by

(2)

The ML solution can be obtained through an exhaustive search but it has high complexity in MIMO systems with higher order modulation schemes. Therefore, Sphere Decoding (SD) algorithms are proposed with the fundamental idea to reduce the number of candidate vector symbols that need to be considered in the search for ML solution through tree search methods. An efficient pruning criterion is used in SD to minimize the number of visited nodes. SD takes into account only the lattice points that are inside a sphere of a given radius r. The sphere constraint is given by:

(3) Where r is the radius of the sphere and y is the center of the sphere. The equation in (3) can be rewritten as

2

(

) (

H

)

y

Hs

y

Hs

r

(4)

After QR decomposition of the channel matrix H i.e., H=QR, QQH =I, R is an upper triangular matrix

2

(

H

) (

H H

)

Q y

Rs

Q y

Rs

r

(5)

2 2

H

Q y

Rs

r

(6)

Equation (6) can be written as

2 2

y

 

Rs

r

(7)

Where

y

= H

Q y

. Equation (7) can be rewritten as

2 2

( )

N

i i ij j

j i

d s

y

R s

r

(8)

We define

d s

i

( )

as the Partial Euclidean Distance (PED). For an appropriate search strategy, simply set r = ∞ with virtually no penalty in terms of complexity [9].

DISTRIBUTED ARITHMETIC

Distributed Arithmetic (DA) is a computation technique that is bit-serial in nature and it replaces the explicit multiplications into ROM lookups [10]. The main advantage of DA is its computational efficiency and it is

an efficient technique to implement in Field Programmable Gate Arrays (FPGAs). In DA, the multiplications are reordered such that the arithmetic becomes distributed through the architecture rather than being lumped. When DA is implemented in FPGAs, the memory in FPGA can be used to implement the multiply-accumulate operation.

Application of DA Algorithm

In (8), the sum of product term is given by

N

i j j j i

R s

. By applying DA algorithm the term

s

j can be

expressed as

1

, 0

2

n

b

j b j

b

s

s

(9)

Where ‘n’ represents the number of bits used to represent the symbol

s

j. Assuming N=4 and n = 4, the equation can be expressed as

0 1 2 3

0,

2

1,

2

2,

2

3,

2

j j j j j

s

s

s

s

s

(10)

where

s

0,jis the Least Significant Bit and

s

3,jis the Most Significant Bit. The PED equation can be represented as

2 1

, , 0 ( ) N n 2b

i i i j b j

j i b

d s y R s

 

 

(11)

In this way, multiplication of constant term R with the symbol gets converted into shift and add operations.

PROPOSED SPHERE DECODER ARCHITECTURE A typical sphere decoder for MIMO detection consists of pre-processing unit and sphere detector unit as shown in Figure-1. The pre-processing unit takes the estimated channel matrix H and the received signal vector y as input and generates the QR decomposition of the H matrix .The orthogonal matrix Q and the upper triangular matrix R are fed as input to the sphere detector unit. The complex conjugate transpose of Q matrix is multiplied with the received signal vector y and fed to the sphere detector unit.

(3)

Depending upon the different search methodology, complexity of sphere decoder varies with the algorithm used. In this paper, K-best algorithm which follows breadth first search technique and does not require a sphere constraint is considered. The algorithm searches for candidates in the forward direction only and the K-best symbol sets based on the minimum PED are taken forward in each level. The parameter K decides the complexity of the algorithm. The value of K must be chosen without compromising the optimality compared to the ML solution. In order to overcome the difficulty in choosing the parameter K, the modified K-best algorithm [11] was proposed.

The algorithm employs parallel and distributed sorting strategy [12] to eliminate the node in the search tree. From the root node, based on the symbols present in a constellation the PEDs are computed in the top layer. Each node is then expanded with its children node and PEDs are computed. Now, with each parent node as reference the symbol set with minimum PED of each parent node is selected and taken forward to the next layer with its children node. The algorithm continues till the tree reaches the leaf node. Parallel comparison and elimination of the nodes in each layer leads to fixed complexity in the algorithm.

In order to simplify the architectural implementation of the proposed algorithm and also to reduce the hardware complexity the DA algorithm is incorporated in the hardware architecture. Therefore, the PED equation mentioned in (8) can be rewritten as

2 1

( )

( )

( )

i i i

d s

d

s

e s

(12)

2

2 2

, 1 ,

( )

N

i i i j j i i i i

j i

e s

y

R s

b

R s

(13)

Where

1 ,

1

N

i i i j j

j i

b

y

R s

 

 

(14)

Therefore, by applying DA algorithm equation (13) gets converted into

2 1 2

1 , ,

0

( )

n

2

b

i i i i b i

b

e s

b

R

s

 

(15)

Similarly,

1

1 , ,

1 0

2

N n

b

i i i j b j

j i b

b

y

R

s

 

  

 

 

(16)

Metric Computation Unit shown in Figure-2 is used for computation of PEDs in all the layers of the tree. Equation (16) is computed using bi+1 Computation unit. Norm calculation unit computes the 2

( )

i

e s in (15). The

adder block finally adds the minimum PED in the previous

layer and the output from the norm computation unit as in (12).

Figure-2. Metric computation unit.

The sum of products in the equation are converted into add and shift operations by making use of DA algorithm. The PEDs calculated are sorted using a modified bitonic sorter unit given in Figure-3. The sorter unit follows an efficient merge-based algorithm [13] which performs parallel sorting and has low complexity compared to other sorters.

Figure-3. Sorting unit.

IMPLEMENTATION AND RESULTS

The Simulink block level modeling tool in MATLAB software includes Xilinx System Generator block set [14] which enables high-level modeling and provides options for implementing signal processing applications. It is a powerful tool for hardware/software co-design and also provides access to distributed and block memory and embedded multipliers. The block set in Xilinx System Generator maps the Intellectual Property (IP) cores with high level blocks for efficient implementation in the target Xilinx FPGA.

(4)

and it is fed as input to the preprocessing unit. After QR decomposition, the R matrix and

y

vector are fed as input to the sphere detector unit. For a 2x2 MIMO system, there are four layers in the tree structure. The DA unit in System Generator which is used to represent the double summation in (16) is given by Figure-4. Here it is assumed that the number of bits to represent the symbol is taken as n=4 and the DA unit computes the double summation in one clock cycle. The Block RAM in the Xilinx Block set is used to store the matrix R, the upper triangular matrix.

The decoding throughput for the modified K-best algorithm is found by (17)

m axlog (2 ) c

f M N T hroughput

C

 (17)

Where

f

cmax is the maximum clock frequency, M is the constellation size , N is the antenna number and C is the number of clock cycles needed for calculating the PEDs in the layer1 of the tree structure. For our design, C = 19 clock cycles for 2x2 MIMO with 4QAM and C = 21 clock cycles for 2x2 MIMO with 16QAM. The decoding throughput are found to be 22.6Mbps and 39Mbps respectively and listed in Table-1.

Table-1 shows the comparison of hardware resource utilization and throughput with the literature. The number of slices utilized are greatly reduced compared to [15] in both the implementations, but the DSP processing elements utilized are same for 2x2 MIMO system with 4QAM but in the case of 16QAM , more number of processing elements are utilized by our design which is taken as a trade-off for the implementation. The implementations in [16] are targeted in Application Specific Integrated Circuits (ASIC) which allows for the placement of components independently and therefore our implementation cannot reach the same decoding throughput.

Figure-4. DA unit using Xilinx system generator.

Table-1. Comparison of FPGA resource utilization.

CONCLUSIONS

This paper presents the architectural implementation of modified K-best algorithm with DA concept for sum of products terms. This reduces the hardware complexity incurred by multipliers in the generic design. The parallel implementation reduces the number of nodes visited in the algorithm. Therefore, the number of clock cycles required to compute the PEDs are also minimized. The sorting complexity is also reduced in our design which leads to efficient implementation. In future, the proposed technique will be extended for higher order modulation with different antenna configurations for the emerging wireless standard.

REFERENCES

[1] G. J. Foschini and M. Gans,”On the limits of wireless communication in a fading environment when using multiple antannas”,Wireless Personal Communications,Vol.6,pp.311-335,1998.

[2] D. Gesbert, M. Shafi, S. Da-shan, P.J. Smith, and A. Naguib,” From theory to practice: An overview of MIMO space-time coded wireless systems” IEEE Journal Selected Areas of Communication, Vol.21, No.3, pp.281-302, March 2003.

[3] U. Fincke and M. Pohst,” Improved methods for calculating vectors of short length in a lattice, including a complexity analysis”, Math. Computat., Vol.44, No.170, pp.463-471,April 1985.

[4] E.Viterbo and J. Boutros,” A Universal lattice decoder for fading Channels”, IEEE Transaction on Information theory, Vol.45, No.5, pp.1639-1642, July 1999.

[5] B. Hochwald and S.Ten Brink,” Achieving near-capacity on a multiple-antenna channel”, IEEE Transaction on Communication, Vol. 51, pp. 389-399, March 2003.

(5)

[6]A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner and H. Bolcskei, ”VLSI implementation of MIMO detection using the sphere decoding algorithm”, IEEE JSSC, Vol. 40, No.7, pp. 1566-1577, July 2005.

[7] Z. Guo and P. Nilsson,” A 53.3Mb/s 4x4 16QAM MIMO decoder in 0.35µm CMOS”,IEEE ISCAS, Vol. 5, pp.4947-4950, May 2005.

[8] E. Agrell, T. Eriksson, A. Varky et.al,” Closest point search in lattices”, IEEE Transaction on information theory, Vol.48, pp.2201-2214, 2002.

[9] M.O.Damen, H. El Gamal and G.Caire,” On Maximum likelihood detection and the search for the closest lattice point”, IEEE Transactions on Information Theory, Vol.49, No.10, pp.2389-2402,October 2003.

[10] S.A.White,” Applications of distributed arithmetic to digital signal processing: A tutorial review”, IEEE ASSP Magazine, July, 1989.

[11] A. Shirly Edward, S. Malarvizhi,”Modified K-best algorithm for MIMO systems”, ARPN journal of Engineering and Applied Sciences, Vol.10, No.5, pp.2284-2288, March 2015.

[12] B.Kim and I-C.Park, K-best MIMO detection based on interleaving of distributed sorting, Electronics Letters,Vol. 44, No.1,2008.

[13] A.Shirly Edward,S.Malarvizhi,”Reduced complexity K-best decoder for LTE standard”, International Journal of Multimedia and Ubiquitous Engineering,Vol.10,no.3,pp.397-406,2015.

[14] System Generator: Reference guide, http://www.xilinx.com/

[15] Johanna Ketonen, Markku Juntti. and Joseph R. Cavallaro, “Performance –complexity comparison of Receivers for a LTE MIMO-OFDM System”, IEEE Transactions on Signal Processing, Vol.58, No.6, pp. 3360-3372,2010.

[16] Z.Guo and P.Nilsson. ”Algorithm and implementation of the K-best sphere decoding for MIMO detection”, IEEE Journal on Selected Areas in Communications, Vol. 24, No.3, pp. 491-503, March 2006.

References

Related documents

Measured insertion loss as a function of moisture content for coherent and incoherent illuminations.. Thickness of the tested sample is

examined to explore the influence of the family context on the child’s adaptation to primary school (Cowan, Cowan, Ablow, Johnson, & Measelle, 2013) but the actual experience

and K.. This is accounted for by the presence of adsorbed on the surface of Pt black saturated with separate Pt oxides in the mixture of T i and Pt H, showed that two

Maka penulis mengajukan penelitian dengan judul : “Pengaruh Komisaris Independen, Reputasi Auditor, Komite Manajemen Risiko dan Konsentrasi Kepemilikan terhadap Pengungkapan

Server-Aided Encryption for Reduplicated Storage:Server aided encryption for reduplicated storage for cloud storage service provider like Mozy, Dropbox, and others

Specifically, given Gs, Gr and q, the RSkNN search finds k nearest objects (Aq = ) to question q’s location on Gr, such the social influence SI(or) to letter through q’s

But of greater importance, our data suggest a new model of second- ary nondisjunction in which XXY associations are ubiq- uitous during early prophase in all oocytes of this