FPGA Implementation of Multi-User Detection Genetic Algorithm Tool for SDMA-OFDM Systems

(1)

FPGA Implementation of Multi-User Detection Genetic Algorithm Tool for SDMA-OFDM Systems

Article in Wireless Personal Communications · July 2015

DOI: 10.1007/s11277-015-2986-x

CITATION

1

READS

30

3 authors, including:

Some of the authors of this publication are also working on these related projects:

Spatial ModulationView project Mohammed Al-Ansi

Universiti Malaysia Perlis 12PUBLICATIONS 24CITATIONS

SEE PROFILE

(2)

FPGA Implementation of Multi-User Detection Genetic Algorithm Tool for SDMA-OFDM Systems

Mohammed Alansi

¹^•

Ibrahim Elshafiey

²^•

Abdulhameed Al-Sanie

²^•

Ahmed Mabrouk

³

Published online: 31 July 2015

Springer Science+Business Media New York 2015

Abstract Robust multi-user detection (MUD) methods based on space division multiple access (SDMA) techniques are essential to efficiently exploit the electromagnetic spec- trum. In this paper, an adaptive Genetic Algorithm-based tool for SDMA-OFDM Systems (GASOS) is developed to improve the performance and computational complexity in cases of fully-loaded and overloaded multi-user scenarios. The data flow in GASOS is appro- priate in pipelining and parallelization to reduce operational time. A new GASOS-based MUD hardware design for SDMA-OFDM systems is proposed using FPGA architecture.

The design details are presented together with their planned operational modules. Resource utilization is optimized, and the total number of clock cycles required is found to be 15 initially, in addition to one clock cycle per member of algorithm population. A clock frequency of 100 MHz is used and implementation is carried out on Xilinx

Virtex-6 FPGA, built in the development platform ML605 edition with JTAG Hardware Co-sim- ulation. According to the results obtained from the developed algorithm and implemen- tation tools, a high number of users can be physically possible and provided with support.

Real-time based implementation of MUD systems has the potential to play a major role in next-generation communication systems.

Keywords MUD SDMA and OFDM systems Genetic Algorithms FPGA Xilinx

& Mohammed Alansi [email protected]

1

Electrical Engineering Department, College of Engineering at Wadi Aldawaser, Prince Sattam bin Abdualziz University, Wadi Aldawaser, Riyadh Region 11991, Saudi Arabia

2

Electrical Engineering Department, College of Engineering, King Saud University, Riyadh, Saudi Arabia

3

Electrical Engineering Department, University of Malaya, Kuala Lumpur 50603, Malaysia

DOI 10.1007/s11277-015-2986-x

(3)

1 Introduction

Potential communication system design depends primarily on achieving optimum exploitation of the available spectrum to accommodate growing traffic demands. The most promising solution to accomplish this goal is based on utilizing the spatial dimension in space division multiple access (SDMA) systems. SDMA has recently received increased consideration because it seems capable with regards to enlarging the sum data rate of different users, and creating a virtual Multiple Input Multiple Output (MIMO) between multiple users and a base station. A combination between SDMA and Orthogonal Fre- quency Division Multiplexing (OFDM) is a potential technique whereby the benefits of both methods can be useful at one time [1–4].

SDMA may support multiple users within the same time and frequency slots. By implementing user-specific channel impulse responses (CIRs), higher bandwidth efficiency is achievable [5]. It can be said that both a spreading code in Code Division Multiple Access (CDMA) and the CIR in SDMA have similar impact on the transmitted signal;

however, they are convolved with the signal and consequently CIR orthogonality, which is fundamental to successful system implementation. Nonetheless, the non-orthogonal nature of CIRs requires powerful channel estimation and robust multi-user detection (MUD) techniques in order to recognize user signals at the receiver.

Diverse linear and non-linear MUD techniques have been proposed to perform user severance. Maximum likelihood (ML) methods are an optimum tool for MUD in terms of achieving lower bit error rate (BER) by constructing an exhaustive search over all possible user signals [6]. Accordingly, ML techniques lead to very high computational complexity that rises exponentially with the number of users and modulation order. Thus, an ML algorithm cannot practically become conscious in the receiver. In contrast, classic MUD schemes, such as Minimum Mean Square Error (MMSE) and Order Successive Interfer- ence Cancellation (OSIC) have low performance but have the advantage of low com- plexity. As a result, some researchers have devoted their energy to developing suboptimum techniques and algorithms that can reach near-optimal performance with reduced com- putation complexity [7–12].

Among the suboptimum schemes, genetic algorithms (GAs) are renowned for their robustness in resolving complex optimization problems. GA is an optimization and exploration technique that relies upon evolving a set of solutions in excess of a period of time. By joining different potential solutions an approximate or best possible solution might ultimately be found [13–18]. Moreover, the GA’s parallel processing capability is a constructive indication to adopt this method of simplifying detection while maintaining bit error rate (BER) performance comparable with optimum ML detection. GA is an opti- mization technique that assists MUD classic schemes, and it was proposed by Juntti et al.

[14] and Wang et al. [19]. Hanzo proposed a GA-assisted joint channel estimation and MUD approach for multi-user MIMO SDMA-OFDM systems, and an over-loaded scenario in view of its use in Rayleigh fading channels [15, 20, 21].

The promising results of GAs in mitigating MUD problems along with the capacity to

improve conventional detector performance and realize near-optimal solutions demonstrate

the importance of validating and examining hardware implementation possibilities in real-

time systems. Furthermore, the simplicity and speed of altering programs related to the

field programmable gate array (FPGA), besides decreasing implementation time, have

solved a number of problems that are extra complex, computationally difficult, or very time

consuming. Therefore, all these FPGA features along with the parallel asset of GAs make

(4)

an attractive tool for GA hardware implementation, which can significantly reduce pro- cessing time and accelerate performance [16, 22–24]. In addition, the authors in [24]

presented a GA simulation tool applicable in the context of an SDMA-OFDM system for determining an optimized solution. Then the initially proposed FPGA implementation block diagram of GA to enhance system performance was introduced in [25].

In this research, the major contributions can be grouped in two parts. First, an adaptive Genetic Algorithm-based tool for SDMA-OFDM Systems (GASOS) is developed to compromise between performance and complexity in both fully-loaded and overloaded multi-user scenarios. Controlling GASOS operational parameters is a powerful means of obtaining acceptable performance under complexity limitations. Furthermore, the perfor- mance of GASOS is compared to various classic (i.e. MMSE and OSIC), suboptimum (i.e.

SD and QRM-MLD) and optimal ML MUD schemes under the reference ITU Raleigh channel model. Second, a new GASOS hardware implementation design is introduced.

This design is modeled and simulated using the Xilinx

blockset in MATLAB Simulink

and then implemented on hardware using the Xilinx

System Generator. Design details illustrating the optimization of resource utilization are presented.

The rest of this paper is organized as follows. Section 2 describes the uplink SDMA- OFDM system, and Sect. 3 presents the detection techniques for SDMA-OFDM systems.

Section 4 explains the GASOS tool methodology. Section 5 illustrates the simulation results. Section 6 shows the complexity analysis of GASOS implementation and Sect. 7 introduces the hardware implementation. Section 8 presents the hardware co-simulation results and finally, Sect. 9 presents the conclusions.

2 Uplink SDMA-OFDM System

As a subclass of MIMO planning, SDMA enables multiple users to share the same bandwidth in dissimilar geographical positions. More clearly, the spatial dimension (i.e.

spatial signature for the user) can be managed and thus, it is possible to recognize indi- vidual user signals [5]. Figure 1 illustrates the SDMA system concept, the transmitted signals of L simultaneous uplink mobile users-each equipped with a single transmit

HP(3)

s⁽¹⁾

P-

element Receive Antenna Array

&

MUD

Techniques

...

User

1

User

2

User

L

...

MS

2

MS

L H1(1)

H2(1)

HP(1)

H2(2)

HP(2)

H1(2)

HP(1)

Rx

1

Rx

2

Rx

P

n1

n2

nP

x1 s⁽²⁾

s^(L)

x2

xP H2(3)

MS

1

H1(3)

Fig. 1 Schematic model of SDMA uplink channel system

(5)

antenna are received by the P different receiver antennas of the BS. At the BS, the received signal is degraded by Gaussian noise at the antenna array elements [20, 26]. Yielding:

x ¼ Hs þ n ð1Þ

where x = [x

₁

, x

₂

,…, x

_P

]

^T

is the received signals vector, s = [s

⁽¹⁾

, s

⁽²⁾

,…, s

^(L)

]

^T

is the transmitted signals vector and n = [n

₁

, n

₂

,…, n

_P

]

^T

is the dimensional noise vector. The (P 9 L)-dimensional matrix H ¼ ½H

^ð1Þ

; H

^ð2Þ

; . . .; H

^ðlÞ

^T

contains the frequency domain channel transfer function matrix (FD-CHTFs) of the Lth users, where H

^l

ðl ¼ 1; . . .; LÞ is the vector of the FD-CHTFs associated with the transmission paths from the lth users transmit antennas to each element of the P-element receiver antenna, expressed as:

H

^l

¼ ðH

₁^l

; H

₂^l

; H

₃^l

; . . .; H

^L_P

Þ

^T

; l ¼ f1; 2; . . .; Lg; ð2Þ where H

^l

is independent for the different transmit stations.

The combination of SDMA and OFDM benefits from both techniques and paying attention to a greater extent interest [27]. Figure 2 shows an SDMA-OFDM system model [5]. SDMA user data is encoded through conventional concatenated encoders, which are a cascade of the Reed Solomon (RS) code as an outer code and inner convolution code.

These encoders join codes intended for random error correction and codes for burst error correction. Moreover, the concatenation of the RS code with an inner conventional code is an ordinary configuration and it is widely used as wireless IEEE 802.16, 802.16A/d/e [28].

The encoder is followed by the inverse fast Fourier transform (IFFT) to obtain the OFDM transmit signal, whereas the FFT is done to detect the signal at the receiver.

All OFDM signals transmitted from the mobile stations (MSs) pass through SDMA channels. MUD techniques are applied at the BS to detect the different user-transmitted

User 1

User L User 2 User 1 User L User 2

... ... ..

Encoder 1

Decoder L Decoder 2 Decoder 1 Encoder L Encoder 2

De- Interleaver 1 Interleaver L

Interleaver 2 Interleaver 1

De-Interleaver 2

De- Interleaver L

MUD Techniques ( MMSE-ML

& GA) IFFT

IFFT

MS 1

MS L

MS 2 SDMA

MIMO Channel

P- element receive antenna array FFT

FFT FFT

... ..

.. ...

.. ... ... ... ..

Mobile Stations

Base Station

(6)

signals with the support of their unique CIRs. The CIRs are initially estimated using channel estimation techniques at the BS and employed in MUD procedures.

3 Detection Techniques for SDMA-OFDM Systems

SDMA-OFDM system implementation success depends on keeping the orthogonality of each user’s CIR. However, non-orthogonality would require powerful MUD schemes in order to recognize user signals. MUD schemes must be applied at the receiver to detect signals arriving from different users. In the uplink scenarios, L different transmitted signals are separated at the BS with the support of their unique, user-specific spatial signature CIRs. A variety of linear and non-linear MUDs have been proposed to perform user separation. Linear detection methods deal with all transmitted signals as interference, with the exception of the preferred signal from the target user. Hence, interference signals from other users are reduced or cancelled out in the course of identifying the desired signal from the target transmitting user.

MMSE linear MUD is a much liked SDMA receiver design approach [26]. The MMSE weight matrix is given in Eq. (3), and it exploits the post-detection signal-to-interference plus noise ratio (SINR).

W

MMSE

¼ ðH

^H

H þ r

²_n

IÞ

¹

H

^H

ð3Þ where r

²_n

is the noise variation in the signal and ð:Þ

^H

denotes the Hermitian transpose operation.

Linear detection methods do not perform as well as other nonlinear receiver schemes, but they pose low complexity for hardware implementation. Based on linear detection techniques, the OSIC scheme is used to improve performance without significantly increasing the complexity [29]. This method can enhance performance by selecting the stream with the highest SINR at each detection stage.

Furthermore, one nonlinear technique considered optimal in terms of performance is the Maximum Likelihood (ML) detector. It determines the Euclidean distance between the received signal vectors and the product of all probable transmitted signal vectors with the given channel, and obtains the minimum distance. ML detection determines the estimate of the transmitted signal vector s as:

s ~

ML

¼ argf min

s2ML

k x Hs k

²

g ð4Þ

where k x Hs k

²

corresponds to the ML metric function. Therefore, an optimum ML

detector attempts to get the best vector of data bits in a set with all possibilities. The vector

that gives the smallest norm in (4) is the most reliable answer. Because ML is a com-

prehensive exploration over all probable constellations, it has an exponential complexity

that increases with higher numbers of users and bits per symbol [6, 27]. Therefore, as the

number of users rises, the ML detector rapidly loses feasibility. Although this particular

technique is computationally complex, its performance serves as a reference to other

detection methods since it provides the best possible performance. Therefore, there are

ongoing active research studies on expanding detection methods to discover suboptimum

methods and algorithms that can provide near-optimal performance with less computation

complexity.

(7)

Sphere Detection (SD) is a suboptimum method of making MIMO system detection straightforward while preserving BER performance analogous to that of optimum ML detection [9]. An SD method was proposed to locate the transmitted signal vector with minimum ML metrics, that is, to find the ML solution vector. Furthermore, QRM-MLD is another sub-optimum scheme where it is assumed that the number of users is equal to receiving antennas at the BS and QR decomposition of the channel matrix is considered [7]. The performance of QRM-MLD relies on the number of candidate vectors M; as M increases, its performance approaches ML performance at the sacrifice of complexity.

Therefore, QRM-MLD complexity is still high, especially for high modulation and high number of users.

Among the sup-optimum techniques, Genetic Algorithms (GAs) are dominant opti- mization tools and are very suitable for optimization problems involving large search spaces, together with mechanisms to escape from local optima. GA is an evolutionary computation methodology in which growth is carried out through the application of ded- icated operators: Selection, Crossover, and Mutation. These operational parameters bring in a variety of control parameters for the advancement from one population to the next.

Thus, GA in the MUD optimization problem for SDMA-OFDM systems (GASOS) tool is applied to find the best result among several solutions representing possible transmitted signals.

4 Methodology of GASOS Tool-Based MUD

In comparison to the optimal ML detector that comprehensively explores among all pos- sible solutions, the GASOS tool does not require testing of all solutions. However, it will apply dedicated operators to attain a near-optimal solution. A significant step in applying GASOS is to describe the objective function that is exclusive to each optimization

,

Finish GA Selection

Crossover

Mutation

Evalutaion

Decision taken

Best Individual

& Select GA parameters Define obective function

Creat initial population, initial range using MMSE detector output

Evaluate fitness for each Individual

Is termination criteria met ?

No Yes

Fig. 3 Flowchart of GA

optimization MUD technique

(8)

problem. The ML decision metric of (4) can be used in GASOS, where the decision metric required for the receive antenna, that is, the antenna’s specific objective function is defined by Jiang and Hanzo [20] and Ke-Lin and Swamy [26]:

X

_p

ð~ s

p

Þ ¼ x

p

H

p

~ s

p

²

ð5Þ

where x

p

is the received symbol at the input of the pth receiver, while H

p

is the pth row of the channel transfer function H, and the possible solution is denoted by s. Thus, the best estimated transmitted symbol vector depends on the knowledge of the received signal at the pth receiver antenna and is given by:

~

s

GAp

¼ argfmin

~sp

½X

p

ð~ s

p

Þg ð6Þ GASOS starts with a population that consists of a fixed number of genes, and each gene matches a specific possible solution to the problem that will be tested among all other solutions or genes. In each generation, all genes in the population are evolved and are then estimated according to the fitness function. Generally, genes with high fitness values have higher probability of being chosen as parent gene to give birth to offspring genes by crossover and mutation. The newly produced offspring genes create a new population.

Then the new population replicates the above steps until a certain number of generations, or a satisfactory solution, is produced [14]. Figure 3 summarizes the main steps of the introduced GASOS optimization tool:

Step 1 Describe the objective function as in Eq. (5) and exploit the knowledge of received signal and channel matrix, then set the GASOS parameters including the probability together with type of mutation and crossover operations, maximum number of generations, population size, and termination criteria. Each gene (i.e. possible trans- mitted solution) comprises a number of variables equal to the number of transmit users antennas.

Step 2 The MMSE detector output is considered an initial solution to enhance the starting point of the optimization and decrease the number of generations. Moreover, it is easy to understand the range of solutions in our MUD optimization problem, as it depends on modulation type and order. The [-3, 3] range is considered in the case of 16-QAM modulation; thus the initial range can be provided for the algorithm.

Step 3 Evaluate the fitness for each gene in the population based on Eq. (6). The evaluation can be described in the following two stages:

In the first stage, each gene from the gene population will be tested via multiplication by the channel matrix to find the estimated received signal. In the second stage, each estimated received signal is compared to the actual received signal and the fitness for the tested gene is obtained. The fit or best gene among other genes has the lowest numerical fitness value, which means it gives the minimum error between the actual received signal and estimated signal using this gene. The best gene can be considered the GASOS solution for the MUD optimization problem and in this stage, the first/initial generation is produced.

Step 4 Test termination criteria, such as reaching the maximum number of generations

or getting the minimum tolerance error between the actual received signal x and the

estimated received signal using the genes. If the termination criteria are not met with the

previous population generated, GASOS operators will be applied. These operators are

selection, crossover, and mutation [15, 16]; then, the mutated genes can be evaluated and

the fitness values will be computed by applying (6), meaning stage 2 of step 3. The cycle of

operations (i.e. selection, crossover, mutation, and evaluation) is performed until the

(9)

optimization criteria are reached. When the new gene generated meets the termination criteria, a decision will be taken; the best solution or gene is chosen and the algorithm concludes.

5 Simulation Results

A simulation tool was built in this research to analyze SDMA communication environment systems. The presented simulation results were obtained using a desktop computer with 3.2 GHz Intel(R) Core(TM) i7 processor and 6-GB RAM using the MATLAB 7.13.0.564 (R2011b) environment. The analysis tool models the uplink SDMA-OFDM system described in Fig. 1 and the main parameters are defined in the tool. For OFDM, the parameters used in IEEE 802.16 Mobile WiMAX standards are considered as an appli- cation for OFDM systems [28, 29]. Conventional concatenated coding is used, where the Reed-Solomon (RS) code is realized as an outer code followed by an interleaver. Con- volutional coding is applied as an inner coder and a Viterbi decoder is applied on the receiver side. Table 1 shows the parameters in the developed analysis tool. Rayleigh fading channels are applied based on the standard ITU channel model [30] and perfect channel estimation is assumed. At the BS receiver, various MUD techniques are applied. One of these schemes is the GASOS tool for MUD optimization problems. This tool is constructed according to the steps described in Fig. 2.

Table 1 Parameters included in the simulation tool

Standard/technique Parameter Value

Conventional concatenated code parameters

RS outer code RS codeword length 15 RS un-coded message

length

11 Interleaver length FFT size

Convolution inner code

Code rate 1/2

Constraint length 7 Octal generator

polynomial

[171 133]

Rayleigh channel Maximum path delay (ls) [0 0.2 0.8 1.2 2.3 3.7]

Tap power (dB) [0 -0.9 -4.9 -8 -7.8

-23.9]

Doppler maximal frequency parameter 100 (120 km/h) OFDM parameters based

WIMAX

System bandwidth (MHz) 5

Sample time (1/Fs, nsec) 178.6

FFT size (NFFT) 512

GA parameters Population size Varied

Number of generations (iterations) Varied Population Initialization MMSE output

Crossover probability 0.95

Mutation probability 0.1

(10)

The performance optimization for the conventional low-complexity MMSE detector can be achieved using GA as explained in the GASOS optimization technique methodology in Sect. 4. The MMSE detector output is applied as an initial population for GASOS and added to other initial random solutions. Then all these possible solutions are tested to decide which is the best solution and provides the minimum BER. Figure 4 indicates the optimized GA performance and compares it with other classic and optimal ML detectors.

The results consider a 4 9 4 SDMA-OFDM system with 4-QAM modulation, where there are four users (L = 4) in the system along with four receive antennas (P = 4) at the BS.

GA has 40 possible solutions included in the population and the number of generations is 20.

Figure 5 shows the performance comparison between GASOS optimization-based MUD and other MUD schemes including sub-optimum (QR-MLD, SD) detectors for a 4 9 4 system using 4-QAM modulation. One interesting advantages of the GASOS tool is the ability to solve the overloaded scenario problem, in which the number of users is higher than the number of receiver antennas at the BS. In distinction to the above simulation results for the fully-loaded scenario, Fig. 6 presents the BER performance for a 6 9 4 SDMA-OFDM system employed for 4-QAM, which is an overloaded case. This case will catastrophically degrade the performance of conventional detectors as a result of insuffi- cient degree of detection freedom at the receiver. Consequently, signals for different users cannot be identified since the high number of users incurred exceeds Multi-User Inter- ference (MUI).

In contrast to classic detectors, Fig. 6 shows that GA is capable of effectively per- forming in overloaded scenarios and solving the problem of conventional MUD schemes;

GA gives near-optimal performance to the ML detector. These results also reveal that the GA can support a high number of users and makes it physically possible, because the GA-based technique dispenses with any constraints concerning the rank of the channel matrix.

In addition, Fig. 7 presents the BER performance in different generation (G) cases versus population size (Z); the population size varies between Z = 10 and 40, while the

0 2 4 6 8 10 12 14

10^-4 10^-3 10^-2 10^-1 10⁰

Eb/No (dB)

BER

MMSE OSIC GA ML

Fig. 4 BER performance of GA,

MMSE, OSIC, and ML detectors

for 4 9 4 SDMA-OFDM

systems

(11)

number of generations (G) varies between G = 1 and G = 17. These results describe the effect of population size and number of generations on performance.

6 Complexity Analysis

This section provides a discussion and comparison regarding the computational complexity of the various MUD schemes including the GA optimization technique. This analysis is based on the number of real multiplications in the decision ML metric (i.e. Eq. (4)) and the derived complexity levels for the other detectors. As mentioned in Sect. 4, an optimal ML

0 2 4 6 8 10 12 14

10^-4 10^-3 10^-2 10^-1 10⁰

BER MMSE

QR-MLD (M=2) OSIC GA

QR-MLD (M=4) SD

ML

Fig. 5 BER performance

comparison between classic, optimum, sub-optimum MUD schemes and GASOS

0 2 4 6 8 10 12 14 16 18

10^-4 10^-3 10^-2 10^-1 10⁰

Eb/No (dB)

BER MMSE

OSIC GA ML

Fig. 6 BER performance of GASOS versus other MUD schemes in 6 9 4 overloaded scenario system

(12)

detector has exponential complexity that increases with number of users L and modulation order. Thus, the number of real multiplications [5] for ML is:

4LPC þ 2PC

^L

ð7Þ

where C is the number of constellation points. The first and second term indicate, respectively, the computations of Hs and the norm square operation in Eq. (4).

Compared to the ML detector, the GA-MUD requires an amount of (Z 9 G) metric calculations; the number Z (i.e. population size) of L transmitted signals is evaluated throughout each of the G generations. Therefore, based on this number of metric

Fig. 7 BER versus population size along with different numbers of generations for GASOS applied in 4 9 4 system.

Population size is varied (i.e.

Z = 10, 20, 30, and 40) and number of generations is varied (G = 1, 5, 9, 13, and 17)

10 15 20 25 30 35 40

10¹ 10² 10³

Population Size

Complexity

G=1 G=5

G=9 G=13

G=17 Fig. 8 Complexity versus

population size with different generations (G) for GA.

Population size is varied (i.e.

Z = 10, 20, 30, and 40); Number

of generations (G) is varied

(G = 1, 5, 9, 13, and 17)

(13)

evaluations and in line with the same procedures and methodologies of derivations in [7]

for ML, it can be deduced that the number of real GA multiplications are:

4L P: exp logðZ GÞ L

þ 2P Z G

ð8Þ where the first and second term indicate, respectively, the computations of Hs and norm square operation in Eq. (5). Furthermore, this number of multiplications may readily be reduced by avoiding repeated calculations of identical solutions (i.e. genes), either within the same generation or across the entire iterative process, provided that the receiver has the necessary memory for storing the evolution history. Another important difference between GA and other detectors is the type of termination criteria, whereby the number of gen- erations is not fixed. Thus, the GA can provide a near-optimal solution before completing all iterations based on the specific tolerance error function and the algorithm process finishes.

From Eq. (8) it can be noticed that the complexity of GA is proportional to the pop- ulation size Z and number of generations G in addition to the number of users and receiver antennas. Figures 8 and 9 show the effect of these two parameters, Z and G, on the complexity of GA assuming a 4 9 4 SDMA-OFDM system under the Rayleigh channel employing 4-QAM modulation, and the signal-to-noise ratio E

b

=N

O

is fixed at 5 dB.

It is noted that a rising population (i.e. increasing number of possible genes) has a better effect than an increasing number of generations in terms of performance and complexity. A slight modification of Z improves performance and does not increase the complexity to a high value; however, a slight change in the number of generations will improve the performance as well, but increasing Z will affect the complexity more than in the case of growing population size. Moreover, managing and controlling these two operational

2 4 6 8 10 12 14 16

10¹ 10² 10³

Max. Generations

Complexity

Z=10 Z=20 Z=30 Z=40

Fig. 9 Complexity versus number of generations with different population size (Z) for GA. Population size

is varied (i.e. Z = 10, 20, 30, and 40); Number of generations (G) is varied (G = 1, 5, 9, 13, and 17)

(14)

parameters, G and Z, can provide an adaptive means of obtaining the required performance under complexity limitations.

7 Hardware Implementation of GASOS

The promising results of GASOS in MUD optimization problems and the potential to improve the performance of conventional detectors while attaining near-optimal solutions have placed emphasis on investigating the implementation possibility in real-time systems.

Although GASOS is able to discover accurate solutions, the time required for large computations and iterations is enormous in instances with high numbers of users and particularly in overload scenarios. Also, high population and generation numbers are required to enhance performance and attain a near-optimal solution. However, one inter- esting characteristic of the GAs is that they are suitable for pipelining and parallelization.

The operational time can thus be kept fixed while enhancing the performance. All these factors make GAs appropriate candidates for hardware implementation. In this research, the design implementation of GASOS is introduced, and it depends on recent progresses in FPGA technology that provides a suitable platform to utilize the parallel characteristics of genetic algorithms.

The Xilinx System Generator (XSG) tool offers a set of models in excess of a few hardware procedures that could be implemented on a variety of Xilinx FPGAs. Thus, the functionality of the hardware system in the Simulink environment can be achieved via a simulation-like environment. This work demonstrates and proposes an FPGA design flow of GASOS that applies a combination of the Simulink family of products, XSG and Xilinx FPGA blocks. GASOS is first modeled and simulated with a Xilinx System Generator blockset and then run using a Xilinx Virtex-6 ML605 FPGA board. Moreover, the Simulink simulation potential with hardware implementation is utilized in a co-simulation model to validate system functionality.

7.1 Implementation Design Methodology-Based FPGA-Xilinx System Generator

XSG initiates the hardware system design starting from a graphical, high-level Simulink setting. System Generator expands the traditional Hardware Description Language (HDL) design and makes graphical modules available. The graphical language allows the design concept with accessible System Generator blocks and subsystems. The tool provides the capability of Hardware Co-simulation (HwCosim) [31].

The System Generator works in a Simulink simulation surrounding, which is an element of the MATLAB mathematical package. A HwCosim block can be produced once the

Table 2 Hardware co-simulation configurations for ML605 board

Board Interface System clock frequency Available frequencies (MHz)

Xilinx ML605 JTAG 100 MHz 100

Point-to-point Ethernet 66.7

Network-based Ethernet 50

(15)

design has been compiled into an FPGA bitstream. Table 2 portrays the available clocks and supported interfaces for our target FPGA ML605 board. The tools employed in this research work are MATLAB R2009b-64 bit along with Simulink 7.4 and System Generator 12.3.

7.2 Shared Memory Support

System Generator hardware co-simulation interfaces allow shared memory blocks and shared registers to be compiled and co-simulated on FPGA hardware. Shared memories can assist realize high-speed data transfer between the host computer and FPGA. Xilinx Shared Memory Read and Write blocks create the complete process of hardware co- simulation designs for Simulink vector and matrix signals in a single simulation cycle.

Figure 10 demonstrates the principle of the shared memory approach applied in hardware co-simulation.

7.3 FPGA Implementation of GASOS

The main block diagram proposed for GASOS implementation in the FPGA hardware environment is shown in Fig. 11. This design considers that eight users are transmitting data, the BS has eight antenna receivers, and 16-QAM modulation is used in the trans- mitter. A MATLAB initial simulation program generates initial parameters and transfers them through the shared memories. These parameters include the actual received signals and channel gain values, where it is assumed that channel estimation is done at the receiver perfectly. In addition, the initial estimated points from the MMSE detector output are used to estimate five initial population points around the transmitted point for each user. The five points comprise the 16-QAM constellation in Gray code, and they are calculated for each user. This initial population forms the genes through a Form-Genes subsystem in the design. Then, the fitness function is calculated for every gene and the estimated received signal is computed in the Gene Testing stage. The best gene is finally identified as the one that provides the minimum error between the actual received signal and estimated received signal corresponding to the gene.

Figure 12 shows the GASOS design implementation using Xilinx virtex-6 FPGA. It is an extension from Fig. 11 and the parallel property of GA is utilized in the implementation.

The first step entails that the formed genes be divided into two parallel implementations and two subpopulations be generated. Each of the two subpopulations is tested to calculate the fitness function and find the best gene. These two parallel steps represent the first generation in GA. Subsequently, the crossover and mutation operations are applied to each of the previous subpopulations. Next, the mutation output is tested to calculate the fitness

Parallel Program Running on FPGA Based System Shared

Memory A

Sequential Program Running on Microprocessor Based

System

Shared Memory B

Fig. 10 Schematic of hardware

co-simulation and shared memor

(16)

function and choose the best gene among the second generation. Finally, a comparison between the two generations’ outputs will be applied in order to maintain the best gene and also the subpopulation that includes the best solution. The various implementation stages and subsystems are explained in the following sections and are arranged in accordance with Fig. 11.

7.3.1 Shared Memory Subsystems

There are three subsystems that incorporate shared memory in the current GASOS design.

The initial population subsystem uses share registers to transfer five initial points around each one of the eight transmitted points and will be used to form the genes. A Channel subsystem uses shared memory to obtain real and imaginary channel parameters and utilize them in the Test Gen subsystem. The Received Signal subsystem applies shared memories to get real and imaginary received signal vectors and use these as reference genes to

B

GenePop New Genes

F^-Mutation & Crossover Estimated

Gene

Reference Gene

Best Gene

E-Choose Best Gene Channel

Enable

GenePop Estimated

D- Test Genes

Initial Pop

Enable

GenePopulation

C-Form Genes Enable

Test Genes Select

Enable Form Genes -Timing Control

Channel

Received Signal

Initial Population A^-Shared Memory

Multiplexer

Dual Port RAM Best Gene

GenPop

Output

Fig. 11 Proposed implementation block diagram for GASOS optimization technique

(17)

compare with the output of the Test Gene subsystem. The best gene among all possible solutions is then selected.

7.3.2 Timing-Control Subsystem

The time control adjustment is designed to manage the starting and ending time for each operational block in the design, and it assists with saving processing time. In the Timing- control subsystem there are two dependent signals controlling the subsystem’s output, and the AND gate is used to combine them. The outputs from the Timing Control subsystem give the control for the starting time of Form-Genes and with two clock cycles of delay they obtain the starting time of the Test Genes subsystem. Additionally, this Time Control subsystem is also employed with crossover and mutation operations in order to control the address of writing the crossed and mutated genes into dual-port RAM.

7.3.3 Form-Genes Subsystem

This subsystem generates all possible genes (i.e. possible estimated solutions for the user’s transmitted signal). Instead of forming totally random solutions, five sphere points around each user’s transmitted signal are transferred to shared memory registers from the genes, where 5

⁸

possible genes will be generated. In the Form-Genes subsystem, the subsystem called States is implemented to determine all the probabilities of forming genes. The initial population points for each user and the States subsystem output are used in SphPoint_Selection subsystems, where the slicers are used to slice off a sequence of bits from the input data and create a new data value and data type is unsigned with its binary point at zero. The Mapping subsystem forms the part of the gene that represents possible transmitted data by user number one. This subsystem will map each sphere point to the appropriate symbol in 16-QAM modulation with Gray code. The address of real and imaginary parts of the mapped symbol can be specified using the sphere point. The output

Fig. 12 Hardware implementation design of GASOS using Xilinx-virtex-6 FPGA

(18)

of the Mapping subsystem for each user represents the part of gene. The concatenated block is used to achieve the full gene, which is then written into a specific address in dual- port RAM.

7.3.4 Test-Genes Subsystem

A Test-Genes subsystem is implemented to extract the real and imaginary parts of the genes and channel parameters from previous inputs, which are equivalent to ChReal, ChImag, GenReal, and GenImag subsystems. The differences between these subsystems are the number of bits, offset and binary positions, and options considered as mask parameters and the fact that they are modified according to the output requirements. Then the real and imaginary parts of the gene and channel are provided into the embedded complex multipliers that serve to obtain the estimated received signal for each possible gene. Outputs are the real and imaginary parts of estimated received signals, and when these values are closer to the actual received signals, better performance will be obtained.

7.3.5 Choose-Best-Gene Subsystem

The best gene is selected according to the rule of achieving minimum error among all other genes and it represents the detected signal for all users. The Choose-Best-Gene subsystem makes comparisons between the actual received and estimated received signals. The estimated real and imaginary parts coming from the Test-Genes subsystem are compared against the received (i.e. reference) signal. The minimum error between the received signal and estimated vector is identified, and consequently, the best gene will be selected as the GA solution.

7.3.6 Mutation and Crossover Subsystems

Crossover and mutation operations are fundamental to evolving towards better solutions and solving the optimization problem. Generating random numbers in the GA (i.e. ran- domly creating the crossover point and probability of crossover and mutation), a Linear Feedback Shift Register (LFSR) Xilinx block of 63 bits is employed. Slicers then capture a specific number of bits to achieve different random numbers. The first 10 bits are taken to determine the crossover probability, the following 5 bits are used to choose the crossover point, and the subsequent 10 bits are utilized to determine the mutation probability.

In crossover implementation, crossover probability is determined through a comparison between a random number with a length of 10 bits coming from the LFSR block and number 400 to get a crossover probability approximately equal to 0.95. Since the length of each gene is 32 bits, the crossover point of 5-bits length will specify a point to separate the two parts of the first and second genes. The Multiplexer (Mux) block will chose number 15 as a crossover point in case the crossover probability is equal to zero, which means that the two parts of the two genes will move directly to the next process without performing the crossover operation. Due to the multiple probability of getting different crossover points, a single-port ROM block is considered to maintain all the numbers representing different possible genes.

The binary mutation operation is implemented in such a way that each gene coming

from the crossover subsystem is fed to the XOR block and the other input is number 96,

(19)

which represents a gene with all bits set to zero except for two bits. These two bits will do the flipping of the input crossed gene and get a mutated gene. Moreover, the mutation probability 0.1 will be chosen according to the random number obtained from the LFSR with a length of 10 bits compared to number 900. The two outputs of the mutation subsystem will go to the Test-Gene and Chose-Best-Gene subsystems. This stage repre- sents the end of the second generation and the final decision is applied using a comparison to choose the final best gene.

8 Hardware Co-Simulation Results and Discussion

Upon design verification and testing in Simulink, a hardware co-simulation block is generated and utilized to program the FPGA for GASOS design implementation. Figure 13 shows the model with the hardware co-simulation blocks. A Xilinx System Generator hardware co-simulation compilation is used to generate two blocks for two generations.

They are embedded with their respective VHDL codes, .bit and .ucf configuration files.

JTAG hardware co-simulation is used to run the model on a Xilinx Virtex-6 FPGA built in the Development Platform ML605 Edition. The input for the HWCosim blocks are taken from the initial Matlab program running on the host computer to write to the shared

GASolution

To Workspace GA _Out

JTAG Co -sim

Main _Model hwcosim

Out Gateway Out

System Generator

GASolution

To Workspace GA _Out

JTAG Co -sim

Main _Model hwcosim

Out Gateway Out GA _Out

JTAG

Co -sim GASolution

(20)

memories while the FPGA output is provided to the workspace. Figure 13 also illustrates that two GASOS design implementations are running concurrently to attain parallel solutions and reduce operation time. A comparison between the two output solutions is done on the computer to obtain the best solution. The number of parallel realizations depends on the percentage of FPGA resources consumed by each module and the number of available FPGAs. The overall design can thus implement parallel hardware processing to meet the time limitations set by the problem and obtain real-time implementation.

Table 3 provides details of the design’s resource requirements, where the target device is Xilinx FPGA virtex-6 xc6vlx240t-1ff1156. The percentage utilization values in the table indicate that the FPGA area is expressed relative to the target device. The table summarizes the utilization requirement in case eight users transmitting to the BS are aided by eight receiver antennas. From the resource estimation reports, it is noticed that the design is efficient in terms of resource utilization. Therefore, this target device can be utilized for higher complexity systems with higher numbers of users and advanced modulation tech- niques; otherwise, the design can be implemented in lower FPGA versions to minimize cost. Furthermore, the utilization summary in Table 4 contains the resource requirements for GASOS implementation design for a system that employs four users transmitting data to a BS that is aided by four receiver antennas. Utilization is evidently minimal in this design.

Focus is directed to algorithm parallelization in order to decrease the time and get the final solution from the algorithm. This property helps calculate the fitness function in the Test-Genes and Choose-Best-Gene subsystems during the time of gene formation in the Form-Genes subsystem, where the latency for two clock cycles is considered in the ini- tialization time. Furthermore, we applied the parallel property in the crossover stage where the crossover operation will begin during forming and gene testing, and thus not neces- sitating waiting to finalize all the genes from the Form-Genes subsystem. Moreover, pipeline options available in the blocks are exploited. The design illustrates there is a soft trade-off between speed and FPGA resource requirements, which can be adapted according to operation scenario.

The hardware co-simulation verification in the Simulink environment shows that our design requires 15 initial clock cycles (InitClocks) to get the first gene in the output. Each clock cycle will provide one output gene at the output among all possible solutions in the GA population, which are derived from the pipeline property in the design. The output gene is already tested and the fitness value is calculated, and the total number of clock cycles (NumClocks) required to get the best gene can be calculated from the following:

Table 3 GASOS implementation and resource utilization results for 8 9 8 systems Information Xilinx-Virtex-6, xc6vlx240t-1ff1156 Device

Used Available Utilization (%)

Number of slice registers 6613 37,680 17

Number of slice LUTs 8921 301,440 2

Number of DSP48Es 21,349 150,720 14

Number of slices used as memory 1041 58,400 1

Maximum frequency 93.041 MHz

Minimum period 10.748 ns

(21)

NumClocks ¼ InitClocks þ Z ð9Þ where Z is the GA population size. In addition, the implementation results in Tables 3 and 4 show that the design works at maximum clock frequency of 93.041 MHz and the minimum clock cycle period is 10.748 ns for the 8 9 8 system design. The clock fre- quency is equal to 71.475 MHz with a minimum period of 13.991 ns for the 4 9 4 sys- tems. This maximum clock frequency will specify the total implementation time for the algorithm. The following expression can be applied to get the total design implementation time (TotalTim):

TotalTim ¼ NumClocks=MaxClock ð10Þ

where MaxClock is the maximum clock frequency for the design.

9 Conclusions

A GASOS tool was modeled, simulated and implemented on Xilinx

Virtex-6 FPGA.

Controlling the operational parameters provides a means of adaptively achieving the required performance under complexity limitations. Comparisons were done between classic, optimum, suboptimum and GASOS detection schemes. Design details were pre- sented, illustrating the operational modules designed. GASOS utilizes parallelization capabilities of a genetic algorithm and FPGA to achieve real-time implementation of MUD in SDMA-OFDM systems.

Resource utilization was optimized, and the total number of clock cycles required was found to be 15 initially, in addition to one clock cycle for every member of the algorithm population.

The maximum clock frequency achieved was 93.04 MHz, whereby the implementation was done on Xilinx

Virtex-6 FPGA built in the development platform ML605 edition using JTAG hardware co-simulation. The current study could be continued by designing flexible hardware implementation of GA in terms of number of users, receiver antennas, and modulation order besides studying the throughput and performance of such system.

Acknowledgments This research project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH), King Abdulaziz City for Science and Technology, Kingdom of Saudi Arabia, Award Number (08-ELE262-02).

Table 4 GASOS implementation and resource utilization results for 4 9 4 systems Information Xilinx-Virtex-6, xc6vlx240t-1ff1156 Device

Used Available Utilization (%)

Number of slice registers 2627 37,680 6

Number of slice LUTs 3924 301,440 1

Number of DSP48Es 9002 150,720 5

Number of slices used as memory 493 58,400 1

Maximum frequency 71.475 MHz

Minimum period 13.991 ns

(22)

References

1. Vandenameele, P., Van der Perre, L., Engels, M. G. E., Gyselinckx, B., & De Man, H. J. (2000). A combined OFDM/SDMA approach. IEEE Journal on Selected Areas in Communications, 18, 2312–2321.

2. Hanzo, L., & Keller, T. (2007). OFDM and MC-CDMA: A primer. Hoboken: Wiley-IEEE Press.

3. Lim, C., Yoo, T., Clerckx, B., Lee, B., & Shim, B. (2013). Recent trend of multiuser MIMO in LTE- advanced. IEEE Communications Magazine, 51, 127–135.

4. Hanzo, L., Mu¨nster, M., Choi, B., & Keller, T. (2003). OFDM and MC-CDMA for broadband multi- user communications, WLANs and broadcasting. England: Wiley-IEEE Press.

5. Proakis, J., & Salehi, M. (2007). Digital communications (5th ed.). New York: McGraw -Hill.

6. Kim, J., Moon, S., & Lee, I. (2010). A new reduced complexity ML detection scheme for MIMO systems. IEEE Transactions on Communications, 58, 1302–1310.

7. Kyeong, K. J., Jiang, Y., Iltis, R. A., & Gibson, J. D. (2005). A QRD-M/Kalman filter-based detection and channel estimation algorithm for MIMO-OFDM systems. Wireless Communications, IEEE Transactions on, 4, 710–721.

8. Sulyman, A. I., Al-Zahrani, Y., Al-Dosari, S., Al-Sanie, A., Al-Shebeili, S., & Tarokh, V. (2012). Two- stage constellation partition algorithm for reduced-complexity multiple-input multiple-output-maxi- mum-likelihood detection systems. Communications, IET, 6, 3350–3357.

9. Amiri, K., Cavallaro, J. R., Dick, C., & Rao, R. M. (2009). A high throughput configurable SDR detector for multi-user MIMO wireless systems. Journal of Signal Processing Systems, 62, 233–245.

10. Haris, P. A., Gopinathan, E., & Ali, C. K. (2011). Artificial bee colony and tabu search enhanced TTCM assisted MMSE multi-user detectors for rank deficient SDMA-OFDM system. Wireless Personal Communications, 65, 425–442.

11. Panagiotis, B., Ng, S., & Hanzo, L. (2013). Quantum search algorithms, quantum wireless, and a low- complexity maximum likelihood iterative quantum multi-user detector design. IEEE Access, 1, 94–122.

12. Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Boston:

Addison-Wesley Longman Publishing Co., Inc.

13. Lin, D., Xu, Y., Song, W., Luo, H., & Liu, X. (2004). Genetic algorithm based multiuser detection for CDMA systems. In Emerging technologies: frontiers of mobile and wireless communication, 2004.

Proceedings of the IEEE 6th circuits and systems symposium on, vol. 1, pp. 321–324.

14. Mitchell, M. (1999). An introduction to genetic algorithms. India: Springer.

15. Sumathi, S., Hamsapriya, T., & Surekha, P. (2008). Evolutionary intelligence: An introduction to theory and applications with matlab. India: Springer.

16. Haris, P., Gopinathan, E., & Ali, C. K. (2010). Performance of some metaheuristic algorithms for multiuser detection in TTCM-assisted rank-deficient SDMA-OFDM system. EURASIP Journal on Wireless Communications and Networking.

17. Juntti, M. J., Schlosser, T., & Lilleberg, J. O. (1997). Genetic algorithms for multiuser detection in synchronous CDMA. In Information theory. 1997. Proceedings., 1997 IEEE international symposium on, p. 492.

18. Wang, X. F., Lu, W. S., & Antoniou, A. (1998). A genetic-algorithm-based multiuser detector for multiple-access communications. In Circuits and systems, 1998. ISCAS ‘98. Proceedings of the 1998 IEEE international symposium on, vol. 4, pp. 534–537.

19. Yen, K., & Hanzo, L. (2003). Antenna-diversity-assisted genetic-algorithm-based multiuser detection schemes for synchronous CDMA systems. Communications, IEEE Transactions on, 51, 366–370.

20. Jiang, M., & Hanzo, L. (2004). Genetically enhanced TTCM assisted MMSE multi-user detection for SDMA-OFDM. In Vehicular technology conference, 2004. VTC2004-fall. 2004 IEEE 60th, vol. 3, pp. 1954–1958.

21. Spina, M. L. (2010). Parallel genetic algorithm engine on an FPGA. Master of Science Computer Science and Engineering South Florida University.

22. Moreno-Armenda´riz, M. A., Cruz-Corte´s, N., & Leo´n-Javier, A. (2010). A novel hardware imple- mentation of the compact genetic algorithm. In Reconfigurable computing and FPGAs (ReConFig), 2010 international conference on, pp. 156–161.

23. Vavouras, M., Papadimitriou, K., & Papaefstathiou, I. (2009). High-speed FPGA-based implementa- tions of a genetic algorithm. In Systems, architectures, modeling, and simulation, 2009. SAMOS ‘09.

International symposium on, 2009, pp. 9–16.

24. Alansi, M., Elshafiey, I., & Al-Sanie, A. (2012). Genetic algorithm optimization tool for multi-user

detection of SDMA-OFDM systems. Presented at the PIERS proceedings, Kuala Lumpur-Malaysia,

March 27–30, 2012.

(23)

25. Alansi, M., Elshafiey, I., & Al-Sanie, A. (2011). Genetic algorithm implementation of multi-user detection in SDMA-OFDM systems. In IEEE international symposium on signal processing and information technology (ISSPIT), Bilbao-Spain, pp. 316–320.

26. Ke-Lin, D., & Swamy, M. (Eds.). (2010). Wireless communication systems: From RF subsystems to 4G enabling technologies. Cambridge: Cambridge University Press.

27. Alansi, M., Elshafiey, I., & Al-Sanie, A. (2011). Multi user detection for SDMA OFDM communication systems. In Saudi international electronics, communications and photonics conference (SIECPC), Ryaidh-KSA, pp. 1–5.

28. Chakchai, S., Raj, J., & Tamimi, A. (2009). Scheduling in IEEE 802.16e mobile WiMAX networks:

Key issues and a survey. IEEE Journal on Selected Areas in Communications, 27, 156–171.

29. Jubair, G. (2009). Performance evaluation of WiMAX/IEEE 802.16 OFDM physical layer. Master of Electrical Engineering Telecommunication, Bleking Institute of Technology, 2009.

30. Vaiapury, K., Malmurugan, N., & Kumaran, S. (2009). Performance evaluation of preamble detection under ITU and SUI channel models in mobile WiMAX. Presented at the First International Conference on COMmunication Systems and NETworkS Next Generation Internetworking 2009.

31. X. Company. (2014). Xilinx Vertix 6 documentation.

http://www.xilinx.com/support/documentation/

virtex-6.htm.

Mohammed Alansi received his B.Eng. Degree in electronics and communication engineering from Ibb University, Yemen in 2006. He obtained his M.Sc. degree in Electrical Engineering from King Saud University, Saudi Arabia in 2012. He is currently a lecturer in the college of engineering at Wadi Aldwaser, Prince Sattam bin Abdulaziz University. His research interests fall in the wireless communication, including multiuser detection, channel estimation, SDMA/MIMO- OFDM systems and FPGA implementation of communication systems.

Ibrahim Elshafiey received his B.S. degree in communications and

electronics engineering from Cairo University in 1985. He obtained his

M.S. and Ph.D. degrees from Iowa State University in 1992, and 1994,

respectively. He is currently a professor in the Electrical Engineering

Department, King Saud University, on loan from the Electrical Engi-

neering Department, Fayoum University, Egypt. His research interests

include computational electromagnetics, communication systems,

biomedical imaging and nondestructive evaluation.

(24)

Abdulhameed Al-Sanie received his B.S. and M.Sc. Degrees in Electrical Engineering from King Saud University, Riyadh, Saudi Arabia in 1983 and 1987, respectively. He obtained his Ph.D. degree from Syracuse University, New York, USA in 1992. He is currently an associates Professor Electrical Engineering Department, King Saud University. His research interests are MIMO communication systems and Space time codes, Coded Modulations and ARQ Systems.

Ahmed Mabrouk received his Ph.D. degree in Electrical and Com- puter Engineering from Boston University in 1998. He worked with Bell Labs Innovations, US, Zarlink Semiconductors, Canada, and Mimos, Malaysia, in various capacities on the development of ASIC chips for wireless communications. Over the past 20 years, he has worked for industry and academia across three continents. Dr. Mab- rouk is a consultant of several local and international institutes. His research interests cover VLSI architectures for signal processing, computational vision and audio processing for low bit-rate communication.