New Tensor Factorization Based Approaches For Blind Source Separation.

(1)

New tensor factorization based approaches for

blind source separation

Bahador Makki Abadi

Thesis sub m itted to th e University of Surrey in can didatu re of th e Degree of D octor of Philosophy

UNIVERSITY OF

SURREY

D epartm ent of C om puting

Faculty of Engineering and Physical Sciences University of Surrey

Guildford, Surrey GU2 7XH, U.K.

November 2011

(2)

ProQuest N um ber: 27607807

The q u a lity of this re p ro d u c tio n is d e p e n d e n t u p o n the q u a lity of the c o p y s u b m itte d . In the unlikely e v e n t that the a u th o r did not send a c o m p le te m a n u s c rip t and there are missing p a g e s , these will be n o te d . Also, if m a te ria l had to be re m o v e d ,

a n o te will in d ic a te the d e le tio n .

uest

P roQ uest 27607807

This work is p ro te c te d a g a in s t u n a u th o riz e d c o p y in g under Title 17, United States C o d e M icro fo rm Edition © ProQuest LLO.

ProQuest LLO.

789 East Eisenhower Parkway P.Q. Box 1346

(3)

A bstract

Blind source separation (BSS), aimed a t estim ation of original source signals from their mixtures w ithout any (or with minor) knowledge about the sources or the mixing medium, is an exciting area of research due to its various applications.

Recently, tensor factorization (TF) has been employed for blind modelling of biomedical d a ta to estim ate the signatures of desired sources and identify the mixing system by factorizing the second/higher order statistics of the mixtures. Our proposed approaches in this thesis extend the conventional T F m ethods to exploit nonstationarity of the sources in developing new BSS methodologies.

For instantaneous mixtures, we propose a novel, so called, first order blind source sep aration (FOBSS) m ethod to factorize the m ixture signals. This m ethod has been used for separation of EEG and linear m ixtures of speech signals and has higher accuracy and robustness in both separation and identification of moderately correlated sources.

The FOBSS m ethod is then extended for separation of m utually correlated subcomponents (P3a and P3b) of event related potentials (ERPs).

In the case of having nonstationary sources w ith sparse events, a new T F based under determined BSS is developed which exploits block sparsity of the sources. This m ethod overcomes the traditional bounds for the maximum number of separable sources in the context of TF. This method, called UOM-BSS, has been used for separation of synthetic and real block sparse signals such as speech.

In addition, with regards to convolutive mixtures, a novel T F based convolutive BSS (CBSS) m ethod has been developed, in tim e domain, by proposing an extended version of FOBSS for separation of sources with sparse events. This m ethod has been applied for separation of heart and lung sound signals form their convolutive mixtures. Finally, a semi-blind version of proposed T F CBSS is introduced. This m ethod has been ap plied for separation of speech signals when some a priori information about th e locations of the speakers and the microphones are available. The results dem onstrate the higher performance of the semi-blind m ethod compared w ith those of blind CBSS.

(4)

A cknow ledgem ents

I would like to thank my supervisor Dr. Saeid Sanei for all his support during my research work in both Cardiff University and University of Surrey. His enthusiastic supervision, invaluable comments, and personal guidance helped me throughout my PhD studies. I am also grateful to all my friends for providing a motivating research environment within our research group.

In addition, I am very thankful to my family and my parents-in-law. It would have been impossible for me to finish this work w ithout th e continuous support and encouragement I received from my beloved wife, Bahareh. She has lost a lot due to my research abroad. I appreciate her understanding, patience, and her love to me and our daughter Sophia. I must also express my gratitude to my parents for their love and prayers for my success. I also wish to express my appreciation to the D epartm ent of Computing, Faculty of Engi neering and Physical Sciences, University of Surrey, for their financial support.

More th a n all, thank Cod for providing me the opportunity to step in the excellent world of science and granting me the capability to proceed successfully.

(5)

Declaration of Originality

This thesis and the work to which it refers are the results of my own

efforts. Any ideas, data, images or text resulting from the work of others

(whether published or unpublished) are fully identified as such within the

work and attributed to their originator in the text, bibliography or in

footnotes. This thesis has not been submitted in whole or in part for any

other academic degree or professional qualification. I agree that the

University has the right to submit my work to the plagiarism detection

service TumitinUK for originality checks. Whether or not drafts have

been so-assessed, the University reserves the right to require an electronic

version of the final document (as submitted) for assessment as above.

(6)

C ontents

A b str a c t iü

A ck n o w le d g em en ts iv

T able o f C o n ten ts iv

L ist o f F igu res x i

L ist o f T ables x ii

L ist o f A b b re v ia tio n s x iii

1 In tr o d u ctio n 1

1.1 O v e r v ie w ... 1

1.1.1 Strategies of the thesis ... 4

1.1.2 Organisation of the t h e s i s ... 5

2 B lin d S ou rce S ep a ra tio n and T ensor F a cto r iza tio n 6 2.1 Problem s ta t e m e n t... 6

2.1.1 Indeterminancies of the p r o b l e m ... 7

2.2 State of the art in blind source s e p a r a tio n ... 8

2.3 W hitening process using P G A ... 10

2.4 ICA based a l g o r i t h m s ... 1 0 2.4.1 Joint diagonalization based ICA m e t h o d s ... 14

2.5 Multi-way d ata representation and fa cto rizatio n ... 18

2.5.1 Special t e n s o r s ... 1 9 2.5.2 Unfolding the tensors ... 1 9 2.5.3 Useful m atrix products ... 20

(7)

Contents ii

2.6 Multi-Way n o ta tio n s ... 21

2.6.1 Inner product of t e n s o r s ...21

2.6.2 Norm of a t e n s o r ... 21

2.6.3 R ank of a tensor and rank one tensor... ...22

2.6.4 Tensor-tensor m u ltip lic a tio n s ... 22

2.7 Multi-way m o d e ls...23

2.7.1 Tucker m o d e l ... 23

2.7.2 CANDECOM P/PARAFAC M o d e l ...26

2.7.3 IN D S C A L ... 29

2.7.4 PA R A F A C 2...30

2.8 Tensor Factorization for B S S ... 32

2.9 C o n c lu sio n s ... 33

3 F irst Order B lin d Sou rce S ep a ra tio n o f N o n sta tio n a r y S ou rces 34 3.1 In tro d u c tio n ... 34

3.2 Problem fo rm u la tio n ... 38

3.2.1 Model and a s s u m p tio n s ... . 39

3.3 P aram eter estim ation... 41

3.3.1 Estim ating ... 41

3.3.2 Estim ating A ... 43

3.3.3 Estim ating ... 43

3.3.4 F irst order blind source separation a l g o r i th m ...44

3.3.5 Relation to the second order m o d e l... 45

3.3.6 Uniqueness of first order model based on the essential uniqueness condition of the second order m o d e l ... 47

3.3.7 Fast FOBSS algorithm and its relation w ith AJD m e th o d s ...50

3.4 Experim ental r e s u l t s ... 51

3.4.1 Simulated d ata r e s u l t s ... 51

(8)

Contents iii

4 A N e w M e th o d fo r B lin d S o u rc e S e p a r a tio n o f C o r r e la te d S o u rc e s b y

M o d ify in g t h e P A R A F A C 2 M o d e l 63

4.1 In tro d u c tio n ...63

4.2 Model and problem fo rm u latio n ... 69

4.3 Estim ation of th e model p a r a m e t e r s ... 70

4.3.1 Estim ation of P f c ... 71

4.3.2 Estim ation of A , H ^, and ... 71

4.4.1 Simulated d a t a ... 76

4.4.2 Real d a t a ... 77

4.5 C o n c lu sio n s... 80

5 A A^o-SCA b a s e d T e n s o r F a c to riz a tio n A p p r o a c h fo r U n d e r d e t e r m in e d B lin d I d e n tif ic a tio n a n d S o u rc e S e p a r a tio n o f S o u rc e s w ith S p a rs e E v e n ts 83 5.1 In tro d u c tio n ... 83

5.2 Model and problem fo rm u latio n ...85

5.2.1 Uniqueness of the proposed m o d e l ... 88

5.3 Estim ation of the model p a r a m e t e r s ... 92

5.3.1 An orthogonal model for underdeterm ined B S S ... 96

5.3.2 Robustness of the algorithm to outliers ... 97

5.5 C o n c lu sio n s...104

6 B lin d a n d S e m i-B lin d S o u rc e S e p a r a tio n o f C o n v o lu tiv e M ix tu r e s u s in g t h e F i r s t O r d e r M o d e l 106 6.1 In tro d u c tio n ... 106

6.2 Prequency-domain convolutive blind source s e p a r a t io n ... 108

6.3 A time-domain T F approach for convolutive blind source separation . . . . I l l 6.3.1 Estim ation of the model p a r a m e t e r s ...112

6.3.2 Estim ation of P ^ ...112 6.3.3 Estim ation of D j t ...113 6.3.4 Estim ation of ... 114 6.3.5 Simulation results ... 115 6.4 Semi-blind time-domain a p p ro a c h ... 118 6.4.1 M a jo r iz a tio n ...1 1 9 6.4.2 Semi-blind geometrically constrained estim ation of A ^ ... 120

6.4.3 Semi-blind CBSS simulation results ...121 6.5 C o n c lu sio n s...1 2 4

(9)

Contents iv

7 C o n clu sio n s and F u tu re R esea rch es 126

7.1 Summary and conclusion ...126 7.2 Future w o r k ... 128

(10)

List o f P u b lication s

The publications listed below account partially for the originality of th e work presented herein.

Journal papers

B . M a k k i A b a d i, S. Sanei, ”A new approach to blind identification and separation of nonstationary source signals” , IE E E Transactions on Signal processing. (Subm itted)

B . M a k k i A b a d i, S. Sanei, ”A novel /Cq-SCA based tensor factorization approach for underdeterm ined blind identification and source separation of sources w ith sparse events” , IE E E Transactions on Signal processing. (Subm itted)

B . M a k k i A b a d i, S. Sanei, ”A k a-S C A based underdeterm ined blind identification of sources w ith sparse events ” , IE E E Signal Processing Letters. (Subm itted)

B . M a k k i A b a d i, S. Sanei, and D. Jarchi,” A new approach to blind identification and separation of nonstationary seizure signals” , IE E E Transactions on Biomedical Engineer ing. (Submitted)

B . M a k k i A b a d i and S. Sanei, ” A geometrically constrained tim e domain approach for convolutive blind source separation” , IE E E Signal Processing Letters. (To be subm itted)

B . M a k k i A b a d i, D. Jarchi, and S. Sanei, ” Blind separation and localization of correlated brain signals using extended PARAFAC2 tensor model” , IE E E Transactions on Biomedical Engineering. (To be subm itted)

(11)

Contents vi

P u b lish ed B ook C hapter

S. Sanei and B . M a k k i A b a d i, ’’Tensor factorization w ith application to convolutive blind source separation of speech” . Machine Audition: Principles, Algorithms and Systems, IGI-Glohal Pub., Edited by W. Wang, 2009.

C onference P u blication s

B . M a k k i A b a d i , D. Jarchi, and S. Sanei, ’’Blind separation and localization of corre lated P300 subcomponents from single trial recordings using extended PARAFAC2 tensor model” , Proc. o f Engineering in Medicine and Biology Gonf. (EMBG), Sep. 2011.

B . M a k k i A b a d i , D. Jarchi, Vahid Abolghasemi , and S. Sanei, ”A geometrically constrained m ultim odal tim e domain approach for convolutive blind source separation” , Proc. o f European Signal Processing Gonf. (EUSIPGO) on Signal Processing, Aug. 2011.

B . M a k k i A b a d i, S. Sanei and D. M arshall, ” A k-dimensional subspace based tensor fac torization approach for underdeterm ined blind identification” , Proc. o f the IE E E Asilom ar

conf. on Signals, System s and Gomputers (AGSSG), Nov., 2010.

B . M a k k i A b a d i, F. Ghaderi and S. Sanei, ”A new tensor factorization approach for convolutive blind source separation in tim e dom ain” , Proc. o f European Signal Processing

Gonf. (EUSIPGO) on Signal Processing, Aug. 2010.

B . M a k k i A b a d i, D. Jarchi and S. Sanei, ”Simultaneous localization and separation of biomedical signals by tensor factorization” , Proc. of IE E E Workshop on Statistical Signal Processing, 2009.

B . M a k k i A b a d i, A. Sarrafzadeh, F. Ghaderi and S. Sanei,”Semi-blind channel esti m ation in MIMO communication by tensor factorization” , Proc. o f IE E E Workshop on Statistical Signal Processing, 2009.

B . M a k k i A b a d i, A. Sarrafzadeh, D. Jarchi, V. Abolghasemi and S. Sanei,”Semiblind signal separation and channel estim ation in MIMO communication systems by tensor factorization” , Proc. of IE E E Workshop on Statistical Signal Processing, 2009.

(12)

Contents vii

O ther C ollaborative P u blication s

D. Jarchi, S. Sanei, J. C. Principe and B . M a k k i A b a d i, ” A new spatiotem poral filtering m ethod for single-trial estim ation of correlated E R P subcom ponents” , IE E E Trans. On Biomedical Engineering, vol. 58, no. 1, pp. 132 -143, Jan. 2011.

S. Ferdowsi, V. Abolghasemi, B . M a k k i A b a d i, and Saeid Sanei ”A new spatially con strained NM F with application to fMRI” , Proc. o f Engineering in Medicine and Biology

Conf. (EM BC), Sep. 2011.

D. Jarchi, B . M a k k i A b a d i and S. Sanei, ”Instantaneous phase tracking of oscillatory signals using EMD and Rao-Blackwellised particle filtering” , Proc. o f the IE E E Interna tional Conference on Acoustics, Speech, and Signal Processing (IC ASSP), May, 2011.

F. Ghaderi, B . M a k k i A b a d i, J. G. M cW hirter, and S. Sanei, Blind source extraction of cyclostationary sources w ith common cyclic frequencies? Proc. o f IE E E Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2010.

F. Ghaderi, S. Sanei, B . M a k k i A b a d i, V. Abolghasemi and J. G. M cW hirter, ”H eart and lung sound separation using periodic source extraction m ethod” . In proc. 16th Int.

Conf. on Digital Signal Processing, 2009.

V. Abolghasemi, S. Ferdowsi, B . M a k k i A b a d i and S. Sanei, ”0 n O ptim ization of th e measurement m atrix for compressive sensing” , Proc. o f European Signal Processing Conf. (EUSIPCO) on Signal Processing, Aug. 2010.

D. Jarchi, B . M a k k i A b a d i and S. Sanei, ”A new spatiotem poral filtering m ethod for single-trial E R P subcomponent estim ation” , Proc. o f European Signal Processing Conf.

(EUSIPCO) on Signal Processing, Aug. 2010.

D. Jarchi, B . M a k k i A b a d i, and S. Sanei,”M ental fatigue analysis by measuring synchro nization of brain rhythm s” , Proc. of IE E E Workshop on Cognitive Inform ation Processing, July 2010.

(13)

Contents viii

D. Jarchi, B . M a k k i A b a d i, and S. Sanei, ’’Separating and tracking E R P subcomponents using constrained particle filter,” 7n proc. o f Int. IE E E Conf. on Digital Signal Processing, 2009.

D. Jarchi, B . M a k k i A b a d i, and S. Sanei, ’’Estim ation of trial to trial variability of P300 subcomponents by coupled Rao-blackwellised particle filtering” , Proc. of IE E E Workshop on Statistical Signal Processing, 2009.

(14)

List o f Figures

2.1 A third order tensor and its modes [ 1 ] ... 18

2.2 (a) Columns, rows, and tubes of a tensor and (b) horizontal, lateral, and frontal slabs of a tensor [2]...19

2.3 A three-way super diagonal tensor [2]... 19

2.4 Unfolding a tensor along its th ird mode [1]... 20

2.5 Multiway models (taken from [3])... 24

2.6 Tucker model [2]...24

2.7 PARAFAC model and rank one tensors [2]... 27

3.1 Sample space-time-frequency decomposition of the EEG signals for two dom inant disjoint sources; tem poral, spectral and spatial signatures of the first separated component [4]... 35

3.2 Sample space-time-frequency decomposition of th e EEG signals for two dom inant disjoint sources; tem poral, spectral and spatial signatures of the second separated component [4]... 36

3.3 Simulated sources; slowly varying envelopes by random vectors (1st row), up-sampled and smoothed envelopes (2nd row ), random source signals (3rd row), and m odulated source signals (4th row)... 52

3.4 Comparing FOBSS and SOBSS results for orthogonal and non-orthogonal (at the presence of outlier) scenarios... 54

3.5 Average channel identification error in different SNRs for separation of mod ulated uniform random sources... 55

(15)

L ist o f Figures

3.6 Average channel identification error in different SNRs for separation of mod ulated uniform random sources when the sources are set to be correlated. . 56 3.7 Average channel identification error in different SNRs for separation of mod

ulated Gaussian random sources... 57 3.8 Average channel identification error in different SNRs for separation of m od

ulated Gaussian random sources when the sources are correlated... 58 3.9 Average channel identification error in different SNRs for separation of

mixed speech signal sources... 59 3.10 Real 16-channel EEG signal... 60 3.11 Results of separation of 10 sources using the proposed method; (top) esti

m ated topographies and (bottom ) separated sources...61 3.12 Results of separation of 10 sources using ICA method; (top) estim ated

topographies and (bottom ) separated sources...62

4.1 Different E R P components with different latencies and polarities (taken from [5])...65 4.2 Simulated P300 and its subcomponents P 3 a and P3b [6]... 66 4.3 Results for synthetic data; (a) original and estim ated P300 subcomponents

at SNR=-10 dB, top left and right using proposed m ethod (error=-23.28 dB), bottom left and right using sPCA m ethod (error=-7.80 dB) and (b) original spatial information in th e middle row, estim ated spatial informa tion using the proposed m ethod (error=-8.86 dB) and estim ated spatial inform ation using sPCA m ethod (error=1.03 dB) in the b ottom row. . . . 78 4.4 Latencies of th e estim ated P300 subcomponents signals, on top P 3a laten

cies using proposed m ethod and sPCA m ethod, a t b ottom P3b latencies using proposed m ethod and sPCA m ethod...80

(16)

List o f Figures xi

4.5 Results for real data; (a) estim ated P 3a and P3b signals, top left and right using proposed m ethod, bottom left and right using sPCA m ethod and (b) estim ated topographies of P 3a and P3b signals, top left and right, using the proposed m ethod, and estim ated spatial information of P 3 a and P3b using

sPCA m ethod in the bottom row...81

5.1 Identification error for different number of sources {Nx = 3, SNR=20 dB)...99

5.2 Identification error for different number of sources {Nx = 3, SNR=10 dB)...99

5.3 Identification error for different number of sources {Nx = 4, SNR=20 dB)...100

5.4 Identification error for different number of sources (with outliers, Nx = 3, SNR=20 dB)...101

5.5 Envelopes of 6 speech signals (top) and the number of active sources at each segment (bottom)... 102

5.6 Identification error for different number of speech sources (SNR = 20 dB)... 103

5.7 Measured SIR for different number of speech sources (SNR = 20 dB)...103

5.8 Original and estimated sources (SNR = 20 dB, SIR = 15.064 dB, channel error = -17.53 dB)...104

6.1 Original and separated profiles (Dfc) of the source signals; top, heart sound, bot tom, lund sound...117

6.2 Original signals on top, convolutive mixtures in the middle, and separated sources for both methods at the bottom two rows... 118

6.3 Iterative m ajorization minimization of cr(X) using th e majorizing function ft(X ,X W )... 119

6.4 Original and separated profiles (D^) of the source signals... 122

6.5 Simulated impulse responses on top left, shortest p ath on top right, semi-blindly estim ated on bottom left, and on bottom right semi-blindly estim ated impulse response between s3 and m 2... 123

6.6 Original signals on top, convolutive m ixtures in the middle, and separated sources for both m ethods at the bottom two rows...124

(17)

List o f Tables

2.1 Mixing models for instantaneous, anechoic, and convolutive m ixtures . . . . 6 2.2 Unmixing models for instantaneous, convolutive anechoic, and convolutive

echoic blind source s e p a r a tio n ... 7 2.3 Tucker model and its extensions [3]... 25 2.4 PARAFAC model and its extensions [3]... 31

4.1 Spatial error between original and separated P300 subcomponents in differ ent SNRs... 77 4.2 Temporal error between original and separated P300 subcomponents in dif

ferent SNRs...77 4.3 Latencies of estim atted P300 subcomponents for different SNRs...79

5.1 U pper bounds for maximum possible Ng versus Nx using SOBIUM and FOOBI m ethods [7], [8]... 87 5.2 SIR measures between original and separated speech signals (average SIR=15.06

dB )... 102 5.3 SIR measures between 10 original and separated speech signals (average

SIR=7.66 dB )...104

6.1 Measured SIR levels between original and separated signals using b oth m ethods...1 1 7

6.2 M easured SIR levels between original and separated signals using th e bo th blind and semi-blind m ethods... 123

(18)

(19)

L ist o f Abbreviations XIV

List o f A b b reviation s

AJD Approximate Joint Diagonalization

ALS A lternating Least Squares

AMUSE Algorithm for M ultiple Unknown Signals E xtraction

BG Block Gaussian

BSCS Blind Separation of Correlated Sources

BSS Blind Source Separation

CANDECOMP Canonical Decomposition

CBSS Convolutive BSS

DOA Direction-of-Arrival

ECC Electrocardiogram

EEC Electroencephalogram

ER P Event Related Potential

FAJD Fast Approximate Joint Diagonalization

FastICA A ast Fixed-Point Independent Component Analysis

FFDIA C Fast Frobenius Diagonalization

FIR Finite Impulse Response

fMRI Functional M agnetic Resonance Imaging

FO First Order

FOBSS First Order Blind Source Separation

FOOBI _{Fourth Order Cum ulant based Blind Identification of U nderdeterm ined mixtui}

HOOI Higher-Order Orthogonal Iteration

HOS Higher Order Statistics

HOSVD Higher Order SVD

ICA Independent Component Analysis

INDSCAL Individual Differences in Scaling

(20)

L ist o f Abbreviations XV KL k-SCA MAP MBD MEG MFD MI NNLS N TF OMP PARAFAC PC PCA PD pdf PSD SCA SIR SNR SO SOBI SOBIUM SOBSS SOS sPCA STFD STFT SVD TALS T F tP C A UBI Kullback-Laibler

k-Sparse Component Analysis M aximum A Posteriori

M ultichannel Blind Deconvolution Magnetoencephalogram

Minimal Filter Distortion M utual Information

Non-Negative Least Squares Nonnegative Tensor Factorization Orthogonal M atching P ursuit Parallel Factors Analysis Principal Components

Principal Component Analysis Positive Definite

probability density function Positive-Semidefinite Sparse Component Analysis Signal to Interference Ratio Signal-to-Noise Ratio Second Order

Second Order Blind Identification

Second Order Blind Identification of Underdeterm ined M ixtures Second Order Blind Source Separation

Second Order Statistics spatial PCA

Spatial Time Frequency D istribution Short-Term Fourier Transform Singular Value Decomposition Trilinear A lternating Least Squares Tensor Factorization

tem poral PCA

(21)

L ist o f Abbreviations xvi

UBSS Underdeterm ined BSS

UOM-BSS Underdeterm ined Orthogonal Model BSS

(22)

C hapter 1

In trod u ction

1.1 O verview

Blind source separation (BSS) is the m ethod for estim ating original source signals from their m ixtures observed at a number of sensors. BSS is an exciting field of research in statistical signal processing due to its applications in various fields such as speech analysis, biomedical signal processing, and digital communications [9], [10], [11] [12]. The main appeal of BSS is th a t source separation has to be achieved w ithout using any training data. Instead, only weak assumptions regarding the sources and the unknown medium are perm itted. Cocktail party problem [10], [13] is one general example of BSS problem. In a cocktail party, many people talk to each other simultaneously. However, a listener in the party can discern the voice of a particular speaker from a myriad of other voices. This ability to select one voice in such an uncontrolled acoustic environment is possible, as the hum an brain learns how to exploit several auxiliary factors such as probability of frequent words in general sentences, accent of the speaker, speaker facial expression, and discrimination between male and female voices. Therefore, the objective in a BSS problem, as mentioned earlier, is to extract the unknown sources from their observed m ixtures when the media is not known too. Unlike the hum an brain which benefits from m ulti-m odal information, most BSS algorithms are uni-m odal (e.g. ju st audio inform ation). Also, there are a number of proposed bi-modal or multi-m odal BSS algorithms [14], [15]. The uni modality and the blindness of BSS implies th a t the BSS techniques have to fully benefit from the weak assumptions concerning the sources and the mixing environment. One

(23)

1.1. Overview

of the most common approaches which assumes statistical independence of th e sources is called independent component analysis (ICA). ICA is a powerful statistical tool, th a t seeks to decompose the d ata into a set of signals th a t are m utually statistically independent. There are three m ajor approaches in using ICA for BSS [16]:

1. Factorising the joint probability density function (pdf) of the reconstructed signals into its marginal pdfs. Under the assum ption th a t the source signals are stationary and non-Caussian, th e independence of the reconstructed signals can be measured by a statistical distance between the joint distribution and the product of its marginal pdfs. Kullback-Laibler (KL) divergence (distance) is an example. For non-stationary cases and for the short-length data, there will be poor estim ation of the pdfs. There fore, in such cases, this approach may not lead to good results.

2. Maximizing th e non-Caussianity of th e estim ated source signals. This concept is equivalent to minimizing the m utual information (MI) between the sources th a t may be considered as a measure of independence.

3. Elim inating the tem poral cross-correlation functions of the reconstructed signals as much as possible. In order to perform this, the correlation m atrix of observations can be diagonalized at different time lags simultaneously. Here, second order statistics are also normally used. As another advantage, it can be applied in the presence of white noise since such noise can be avoided by using th e cross correlation only for r ^ 0. Such a m ethod is appropriate for stationary and weakly stationary sources (i.e when the stationarity condition holds w ithin a short segment of data).

In m ajority of the cases it is assumed th a t th e number of sources is known. The assum ption about the number of sources avoids any am biguity caused by false estim ation of the num ber of sources. In exactly-determined cases the number of sources is equal to th e num ber of mixtures. In over-determined situations, the number of m ixtures is more th a n th e num ber of sources. However, the BSS problem is even more complicated in under-determ ined cases when there are less sensors th an sources. In such a case, a practical assum ption regarding the sources can be made, i.e. having sparse/disjoint sources. The sparsity of th e sources refers to the situation where the sources are mostly inactive in tim e domain. In these cases the sources can be considered disjoint, thus, enabling one to exploit th e stru cture of the

(24)

1.1. Overview

mixing process [13]. Sparsity of a signal may be in other domains, rather th a n time, too. These m ethods which rely on sparsity of th e sources are called sparse component analysis (SCA) based methods. SCA is generally a technique which extracts sparse signals from their observations.

Joint diagonalization of second order (SOS) and higher order statistics (HOS) of m ixture signals are the most common approaches to implement the ICA [17] [18].

On the other hand, tensor factorization as a powerful tool has been employed to jointly diagonalizing SOS or HOS information [19] [20]. Moreover the tensor factorization is able to tackle the under-determ ined blind identification (UBI) problem which estim ates the mixing system (not the sources) even w ithout sparsity assum ption on sources [7]. All of above m ethods are developed to tackle the one specific case of BSS problem called instantaneous BSS. Blind source separation as hot topic w ithin the signal processing and the neural networks communities dates back to the work of H érault and J u tte n in the French conference G RETSI in 1985 [21]. During the last two decades BSS has evolved into three main classes, notably instantaneous, anechoic, and echoic/convolutive BSS. W hen the high signal propagation velocity allows the assum ption th a t the m ixtures impinge on the sensors w ithout any relative delay, it is called instantaneous blind source separation. It arises in a number of biomedical applications such as separation of electrocardiogram (EGG), electroencephalogram (EEG), and magnetoencephalogram (MEG) [16] [22] [23]. On the other hand, anechoic BSS can be seen as the interm ediate between instantaneous and convolutive/echoic BSS since there is only one p ath for each source-sensor exists and the transm ission involves delay only. It refers to th e situation whereby due to low propagation speed a delay is associated w ith each source in the m ixtures through direct paths. Examples of such a scenario are: group of persons talking in an open area, the acoustics in an anechoic chamber or acoustic room, Doppler frequency-shifts differing between mobile sensors and sources [24], and spatial shifts from refiections through window glass [25]. In convolutive or echoic BSS, each element of the mixing m atrix is in fact an impulse response filter th a t simulates m ultipaths from sources to sensors.

In this case, to produce the current m ixture sample th e past and the present samples of the source signals are necessary. In this thesis we are going to exploit the im po rtant features of the tensor factorization methodology to develop new blind source separation m eth ods. All above three types of BSS problems (linear exact-determ ind or over-determind.

(25)

1.1. Overview

linear under determined, and convolutive BSS problems) are addressed in this thesis in view of improving the existing BSS techniques for separation of nonstationary sources or separation of sources with sparse events.

1.1.1 Strategies of the thesis

Tensor factorization based m ethods have been used in many areas such as chemometric, psychology, brain modelling, communication, image processing, bioinformatic, p attern recognition, and d ata mining applications [26]. More specifically, th e tensor factorization is known to be useful for BSS with applications in communication, biomedical, and audio signal processing. It is shown th a t the tensor factorization can be used to tackle the simultaneous diagonalization problem as a p art of ICA methodology [20]. So, th e tensor factorization can also be employed on all ICA based BSS applications as well.

In this thesis we are going to propose new tensor factorization based BSS m ethods for biomedical and audio signal separation applications. O ur main objective is employing the tensor factorization concept in order to improve the performance of blind source separation and identification of instantaneous, exact/under determined, and convolutive m ixtures of nonstationary sources. The objectives of this research are listed as follows:

1. Reviewing state of the art research on m ajor BSS m ethods w ith more focus on joint diagonalization based m ethods and consequently tensor factorization concept. 2. Developing an efficient and robust tensor based BSS m ethod which exploits th e

nonstationarity of the independent sources.

3. Proposing a suitable BSS m ethod which can tolerate the dependency of th e sources. This m ethod can be used for separation of highly correlated brain signals such as the event related potentials.

4. Exploiting block sparsity of the signals to propose new underdeterm ined blind iden tification and separation methods which unlike traditional undeterm ined m ethods exploits event sparsity.

5. Developing techniques in time domain to tackle the blind and semi-blind convoula- tive blind source separation problems for separation of nonstationary independent sources.

(26)

1.1. Overview 5

1.1.2 Organisation o f the thesis

C hapter 2 is devoted to reviewing the BSS and tensor factorization concepts. The theo retical conditions under which a proper blind identification and separation is achievable are provided in detail. The m ajor BSS algorithms are also described. Furtherm ore, the problem of joint diagonalization and its relation w ith the tensor factorization is expressed. Finally, the basics of tensor factorization concept w ith focus on Kruskal tensor model [27] are described.

In C hapter 3, a new orthogonal tensor model for segmented m ixture signals built by mixing independent nonstationary sources, called first order tensor model, is proposed. This model, unlike the general BSS m ethods which use the higher order, in particular second order statistics, deals w ith signals directly. This model is more robust to violating the independence and orthogonality assumption of the segmented source signals. An efficient algorithm is developed to estim ate the proposed model param eters.

In C hapter 4, the proposed first order tensor model is extended to separation of highly overlapped/correlated signals by defining a new tensor-based structural model which takes the correlation into account. In order to tackle th e underdeterm ined BSS, the block sparsity constraint of the segmented sources is considered in C hapter 5 for the proposed first order model. This approach improves the general upper bound for maximum possible number of sources in second order underdeterm ind blind identification problem suggested by the existing methods.

Chapter 6 presents a novel technique for blind separation of block sparse sources from their convolutive mixtures. The proposed m ethod is built upon the first order tensor model. Moreover, this m ethod can benefit from geometrical information of the sources in cocktail party problem. Consequently, a semi-blind convolutive BSS m ethod with application in multi-modal blind source separation is proposed.

Finally, in C hapter 7, the work presented herein is summarized and some future research directions are proposed.

(27)

C hapter 2

B lin d Source Separation and

Tensor Factorization

2.1 P rob lem statem en t

The BSS problem is to estim ate the constituent Ns sources s{t) of a given set of Nx observed m ixture signals x (t) which are often contam inated by noise signals v (t), w ith minimum assumptions about the mixing medium and the sources. The mixing models can be summarized as shown in Table 2.1 [13] where t is th e time:

In this table i = 1,..., Nx, Vi(t) denote the îth element of the m ixture column vectors x{t)eR^=^, v { t)e R ^ ^ respectively, and Sj{t) denotes the j t h element of the source column vector s(t)E R ^^. t denotes the discrete tim e index, p = 1,..., L denotes the tim e lag index, and aijp is an element of the mixing m atrix A p, corresponding to its zth row, j t h column, and its delay In th e absence of the subscript p in aij it is assumed th a t there is at most one delay and Ap is called mixing m atrix A , also A is called the signal dictionary

Table 2.1: Mixing models for instantaneous, anechoic, and convolutive m ixtures

Mixing Model M athem atichal Model

Instantaneous ₌

₊

_Vi{t)

Convolutive Anechoic _{W =} _-_{n j)}₊_Vi{t)

(28)

2.1. Problem statem ent

Table 2.2: Unmixing models for instantaneous, convolutive anechoic, and convolutive echoic blind source separation

Mixing Model M athem atichal Model

Instantaneous

Convolutive Echoic _{Vji^) ~ 'Ï2i=lï2p=l'^jip^i(^ ~ '^jip)}

or basis m atrix in the SCA literature [28]. In theory, the impulse response between the ith sensor and the ^th source can have infinite length (L = oo). However, practically, it is assumed th a t L < oo. Moreover, in this work, it is assumed th a t A p are stationary for whole period of observations. In the EEG dipole source models [29], th e stationarity of A implies th a t the sources have fixed locations and orientations. Similarly, in this work it is assumed th a t the sources do not move.

Instantaneous and convolutive source separations are th e two m ain problems w ithin the BSS community. Instantaneous BSS m ethods have b etter performance compared w ith those for anechoic systems. Similarly, convolutive BSS algorithms have b etter performance in anechoic scenarios [13], [24], [25], [30], [31], [32]. Generally speaking, most of BSS approaches try to estim ate the inverse of generative mixing models shown in Table 2.1. So, the source separation problem for the instantaneous and convolutive cases, by ignoring the noise terms, can be shown by Table 2.2. where j = 1,..., Ng, yj{t) denotes th e j t h element of the estim ated source column vector y{t) and Wjip is the element of the separating or unmixing m atrix W p corresponding to its j t h row, ith column, and its related delay rjip.

2.1.1 Indeterm inancies o f the problem

In BSS, it is not possible to uniquely estim ate the source signals w ithout some a priori knowledge. In th e absence of such knowledge, there exists two am biguities inherent to BSS, namely the perm utation and the scaling ambiguities. In other words,

1. Due to the blindness of the problem, th e order of the recovered sources is not known since both the mixing m atrix and the sources are unknown [9]. Thus, a change in th e order of the recovered sources also implies a perm utation of the corresponding columns of the mixing m atrix. Therefore, any change in the order of the term s for the outer sum m ations in Table 2.2 does not affect the result of the summations.

(29)

2.2. State o f the art in blind source separation

2- The am plitude of original sources can not be determined. Since b o th A and sj; j = l, . . . , N s are unknown, any scalar multiplier a j of source Sj can be cancelled by dividing the corresponding column by the same multiplier;

^ E (2-1)

3 ^

Clearly, the effect of m ultiplication of a j by Sj source vector can be cancelled out by dividing the j t h column of the mixing m atrix by th e same factor a j. This dem onstrates th a t the sources can be estim ated only up to a scaling constant. In some BSS m ethods to overcome this problem the variances of th e source signals are assumed to be one [9], [33]. Moreover, it is notable th a t th e scaling am biguity also includes the sign ambiguity, i.e. the BSS model will not be altered, if any of the sources is multiplied by -1. These ambiguities show th a t the unmixing m atrix W is not necessarily th e exact inverse of the mixing m atrix A . Instead,

W = P A A t (2.2)

where the superscript (.)t denotes pseudo-inverse operator, P is a perm utation m atrix, and A is a diagonal m atrix to convey th e scaling ambiguity.

These indeterminacies are usually expressed as scaling, perm utation, and delay of esti m ated source signals (in convolutive scenario) which are not so im portant in most of the real world applications. In m ajority of signal processing applications, estim ation of the exact am plitude, order of the signals, or even tim e delays are not very crucial and it is desirable to have only th e waveforms of the original sources to exploit useful inform ation form th e estim ated waveforms [33].

2.2 S ta te o f th e art in blind source separation

Several algorithms in the context of BSS have been developed each relying on different assumptions and exploiting different characteristics of the signals. This section mainly overviews ICA techniques. ICA estim ates statistically independent sources, whilst SCA recovers the sparse sources. ICA dates back to the early work of H errault and Ju tte n [21]. Later, this work was applied to BSS in 1985. Infomax proposed by Bell and Sejnowski [34] sparked much enthusiasm for exact-determined (i.e. equal num ber of sources and sensors).

(30)

2.2. State o f the art in blind source separation

and over-determined (i.e. more sensors th a n sources) BSS scenarios. W henever the number of sensors is less th an the number of sources, SCA is a more practical tool to separate the sources. In such cases, the number of active sources at each tim e instant should be generally a t most equal to one [35] . This particular situation is term ed as sparsity of the sources. In 2000, the work of Bofill and Zibulevsky [36] illustrated th a t SCA can solve under-determ ined BSS by exploiting the sparsity of sources in their time-frequency, by applying the short-tim e Fourier transform, representation. Consequently, SCA attracted much attention and attained a much wider audience. The concept of SCA already had been exploited in mid-1990s. The fundam ental assum ption of SCA can be extended to those d ata which can be sparsified by a given transform ation such as Fourier-transform , wavelet transform , and so forth. SCA estim ates the basis vectors of th e mixing m atrix, by exploiting th e geometric constraint imposed by sparsity, followed by sparse source recovery. More recently, the SCA has been improved by introducing k-sparse component analysis (k-SCA), when th e sparsity condition is relaxed to having more th a n one active source a t each time [37] [38] [39], [40], [41]. On the other hand some of the under-determ ined blind identification m ethods benefit from tensor factorization concept to estim ate the mixing system [7], [8], [42]. It should be noted th a t most ICA and SCA algorithm s do not take into account the noise component Vi(t) in th e mixing models of Table 2.1. In other words, they ignore the noise term in their source separation process as can be seen in Table 2.2. In the following section, ICA is defined and a survey on th e existing m ethods has been provided. Likewise for T F in a later subsection. Here, we review Infomax (derived from information maximization), briefiy fast fixed-point algorithm for independent component analysis (FastICA), second order blind identification (SOBI), and joint approxim ate di agonalization of eigen matrices (JADE). These m ethods are respectively based on mini mization of the m utual information, maximization of the negentropy of every estim ated signal, diagonalization of a set of tim e delayed covariance matrices, and minimization of sum of th e squared cross-cumulants of the estimates. Some of these m ethods will be used in the next subsections to evaluate the performance of the proposed methods. M ost of the above m ethods need to have zero mean d a ta as their input and also some of them needs to have whitened inputs. So, in the context of BSS the whitening is introduced as a pre-processing step, which is normally done by principal component analysis (PCA), and is briefly explained in the next subsection.

(31)

2.3. W hitening process using P C A 10

2.3 W h iten in g process using P C A

In the context of BSS, PC A seeks to remove the cross-correlation between the observed signals, and ensure th a t they have unit variance. It operates by finding the projections of the m ixture d a ta in orthogonal directions of m aximum variances [9]. A vector z is said to be spatially white if

E { z z ^ - 1} = 0 (2.3)

where E { .} denotes the expectation operator and I is the identity m atrix. In BSS problem the separating m atrix, W , can be decomposed into two components, i.e.

W = U V (2.4)

where V is the whitening m atrix and U is a rotation m atrix [43]. The whitening m atrix V can be com puted as follows:

V = Q - 5 E ^ (2.5)

where E is the eigenvector m atrix of the covariance m atrix of x (t) at zero lag, = E { x (t)x (t)^ } . It projects the d ata into the orthogonal space. Q is a diagonal m atrix and includes the eigenvalues of Ca;. However, it is im portant to notice th a t whitening m atrix V is not unique because it can be pre-multiplied by an orthogonal m atrix to obtain another version of V . Moreover, when noise is involved in observation there are some robust whitening algorithms which take different lagged covariance matrices into account to com pute the optim um whitening m atrix [44]. A lthough many ICA algorithm s use pre whitening step, this process has the disadvantage th a t the calculations are directly aflPected by additive Gaussian noise. Recall th a t ICA is blind to additive Gaussian noise. Errors introduced in the pre-whitening step cannot be completely removed in th e secondary processes (even when they deal with higher-order statistics which are more robust to additive Gaussian noise) [45].

2.4 IC A based algorithm s

Independent component analysis is a statistical approach to decompose m ultivariate mix tures into components th a t are statistically as independent as possible. Basically ICA

(32)

2.4. IC A based algorithms 11

has been designed for instantaneous BSS model [9], [10], [21]. Later, some researchers developed extended ICA for convolutive BSS problems. In effect, ICA implies th a t the joint probability density function p(s(t)) of the sources can be factorised as:

Ns

P(sW ) = (2.6)

J=1

where pj(s(t)) is th e marginal distribution of the j t h source. Furtherm ore, the statistical independence of the sources implies the uncorrelatedness of the sources, bu t the reverse is not necessarily true. As it is mentioned, most ICA algorithms decorrelate th e m ixtures via spatial whitening before optimising their separating criteria or cost functions. As starting point to introduce ICA based algorithms, we describe Infomax [34]. This m ethod is developed by Bell and Sejnowski endeavour to maximise the statistical independence by minimising the m utual information between th e estim ated sources. Two independent variables yi and y2 are called statistically independent, whenever their m utual inform ation

is zero. In Infomax algorithm in order to minimise the m utual inform ation the o utput entropy is maximised [46]. Assume x is the input to the neural network and y is the ou tput vector. The following equation describes the relation between the inputs and the outputs of the network.

2/j = '0 j ( w J x ) + nj (2.7)

where 'ipk are some nonlinear scalar functions which governs the activity of th e o utput neurons, are the link weights, and n = [ni, ri2, ..., n^v^]^ is th e additive Gaussian white

noise vector. The entropy of the output is shown by

H {y ) = i7(V’i( w f x ) , V'2( w |’x), ...,ipNs{y^Ns^)) (2.8) For a typical invertible transform ation of the random vector x , y = / ( x ) , the relationship between the entropies of y and x can be expressed as:

(33)

where J / ( . ) is the Jacobian m atrix of the function / ( .) [46]. So, th e transform ation of the entropy in (2.9) can be rew ritten as

f f ( y ) = f f ( x ) + E { l o f f l d e t ( ^ ^ ) l } (2.10)

I I ( y ) then can be easily derived and simplified as follows:

R ( y ) = H {x) + log\det{W)\ (2.11)

j

Above formulation dem onstrates th a t Infomax can be equivalent to maximum likelihood estim ation if th e nonlinear functions are chosen as the culumative distribution functions corresponding to the density pj of the j t h source, i.e. 'ip'j = pj. Therefore, any m ethod for maximizing the likelihood can be used here to maximize th e entropy of the neural network output. Gradient, natu ral gradient, and fast fixed-point algorithm s have been proposed to find the maximum point of the likelihood function [46]. Here, we discuss th e Bell & Sejnowsky algorithm as the simplest algorithm obtained by gradient m ethod [47]. Using the stochastic gradient of the log-likelihood expression in (2.11) the u pdate rule of the neural network weight vector is as

A W oc [ ( W ^ ) - ' - 0 (y)x^] (2.12)

where 0(y) is a nonlinear function represented by a column vector whose j t h element is

12 .1 3 , P v j

The approxim ated probability density function (pdf) of the j t h source signal is shown by

P y j

-Another approach to maximise the statistical independence of the estim ated sources is maximising non-Gaussianity. FastlGA is inspired by the Central Limit Theorem, in which the distribution of the sum of independent random variables tends to a Gaussian distri bution [9]. Based on this theorem it is assumed th a t the distribution of the m ixtures is closer to Gaussian distribution th an th a t of the individual sources. Therefore, statistical independence and non-Gaussianity are known to be equivalent in the context. However,

(34)

this would result in th e main lim itation of the m ethod which implies th a t a t most one source can possess a Gaussian distribution. Negentropy is a non-negative function which quantifies how much a random variable deviates from Gaussianity. This function can be formulated as [46]:

N (y ) = H{yg) - H{y) (2.14)

where H{.) denotes entropy of the enclosed term and y g is a Gaussian random variable of the same variance as y. Regarding the properties of the entropy function it is concluded th a t negentropy is always non-negative. Due to the com putational complexity of calculat ing negentropy, Hyvarinen et al. proposed to use th e following approxim ation instead [46] :

N (y ) (X [E{g(%)} - (2.15)

where g{.) can be a non-quadratic function. Choosing a g{.) th a t does not grow too fast can provide more robustness. Two choices of g(.) as g{u) = ^ lo g co sh {a iu ) and g{u) = —exp{—v?‘/2), where 1 < <ii < 2, have been used [46]. Differentiating (2.15) w ith respect to the separating vector Wj corresponding to the j t h source yields:

A w j = a E { z g {v/j z)} (2.16)

where a = E {g {v /jZ )} — E{g{xg)}, g (u) = and z is the w hitened m ixture. The following fixed point iteration is then suggested intuitively:

Wj-fe+i <- F { zy (w T ^z)} (2.17)

As the convergence of th e fixed point iteration of (2.17) is not satisfactory, a Lagrangian approach is employed in [46] to yield a convergent fixed point iteration as follows:

Wj,fc+i ^ E { z g { w l ^ z ) } - F{/(w Jfcz)zW j,fc} (2.18)

where g” {u) = Because of having w^- on both sides of above updating equation must be normalized a t each iteration. This m ethod has been widely used in biomed ical signal processing [48], [49], [50], which relies on the fact th a t most n atural signals are non-Gaussian. For both FastICA and Infomax algorithm s the d ata is pre-whitened. Al

(35)

though these algorithms are popular, from a theoretical point of view, they are not optimal, see e.g., [51]. Furtherm ore, a recent article [52] illustrated th a t both algorithm s more ex ploit sparseness th an independence of the sources in the application of extracting brain sources from functional magnetic resonance imaging (fMRI) signals.

2.4.1 Join t diagon alization b ased IC A m eth od s

If the underlying sources in an ICA problem are not white random variables, i.e. they are natural tim e series or synthetic signals w ith particular tim e structure, higher order statis tics can be used to the estim ation of the model. The assumption th a t th e components are independent implies th a t they have no spatial, tem poral or spatial-time-frequency depen dencies [53]. Hence, it is possible to capture the tem poral dependency of the measurements using a set of square matrices to estim ate the unmixing m atrix as th e approxim ate joint diagonaliser of the matrices [53]. Surprisingly, these second-order techniques are able to estim ate the model where ICA m ethods fail, for example, when the sources are tem porally correlated bu t have Gaussian distribution [46].

The covariance m atrix of an observation vector at delay zero, i.e. Cg = E { x (t)x ( t)'^ } does not provide enough param eters to allow estim ation of the mixing m atrix. Because, diagonalising this covariance produce white b ut not necessarily independent signals [46]. However, the unmixing m atrix can be estim ated if the tem poral structure is taken into account in the form of a set of lagged covariances or other higher order statistics [46]. The lagged covarince matrices as the simplest second order type of tim e structu re can be com puted by = E { x ( t ) x ( t — r)^ } where r denotes the tim e delay in sample. Based on above definition the relation between cross-covariance matrices of th e observation and the source signals can be shown as

c ; = A C * A ^ (2.19)

where A is th e mixing m atrix and C® denotes the lagged cross-covariance of th e sources [53]. If th e sources are Gaussian, uncorrelatedness a t second order statistics can be con sidered equivalent to independence [54]. So, unmixing m atrix W can be estim ated so th a t the m atrix W C ^ W ^ , which is an estim ation of C®, is as diagonal as possible [53]. The

(36)

diagonality may be quantified, for example, in term s of the am ount of the non-diagonal ele ments [53]. These m ethods require the components to have distinct and independent power spectra [55] which implicitly implies their non-stationarity. This approach was originally applied for only two time lags on the basis of two consecutive eigenvalue decompositions, in a m ethod called Algorithm for M ultiple Unknown Signals Extraction (AMUSE) [56]. In this method, it is assumed th a t z(t) is the whitened version of the observation vector of x (t). In the next step, the eigenvalue decomposition of = (C^ -}- C ^ ^ )/2 for one ar bitrary r is calculated. The rows of the separating m atrix are given by the eigenvectors of D^. Although this algorithm is very simple and fast, it only works when th e eigenvectors of the m atrix are all distinct, which is not always guaranteed.

S e c o n d o r d e r b lin d id e n tific a tio n

To benefit from uncorrelatedness of more th an two tim e lags an extension of th e AMUSE algorithm considers th e covariance matrices at several tim e delays and tries to reduce the dependency on the appropriate tim e delays and hence improves the performance of AMUSE. This m ethod, called SOBI, simultaneously diagonalizes the covariance m atrices calculated at different tim e delays [17]. A lthough practically it is not possible to perfectly diagonalize all the matrices, the objective is to minimize th e value of the following cost function:

J ( W ) = o / /( W C J W ^ ) (2.20)

r e F

where F is a set of arbitrary chosen tim e lags and o f f ( . ) is th e sum of squared off-diagonal elements. This process to minimize the above cost function is called Joint Diagonalization. The unmixing m atrix estim ated by this m ethod is a unique unitary m atrix under mild conditions, diagonality of C^s and independency of their power spectra, for Essential Uniqueness of Joint Diagonalization theorem [17]. The diagonalization process is done by multiplicative orthogonal rotation matrices. Jacobi and Givens rotation m atrices are mostly used to estm aite the diagonalizer m atrix or unmixing m atrix in BSS.

J o in t a p p r o x im a tio n d ia g o n a liz a tio n o f e ig e n v e c to rs

A nother approach in IGA consists of using higher order cum ulant tensors which is some times referred to as ICA by tensorial m ethods using higher-order cum ulant tensors [46],

(37)

[53]. Tensors are generalization of the matrices hence, the cum ulant tensors are general ization of th e second-order covariance matrices. In analogy to SOBI m ethod, making the fourth-order cross-cumulants zero or as close to zero as possible implies statistical indepen dence of the sources [18], [53]. In this case, an idea similar to whitening th e d a ta by using eigenvalue decomposition of the covariance matrices can be used [46]. The entries of the cum ulant tensor are the fourth order cross cumulants of the data, i.e, Cum(xi, xj, Xk, xi), where 1 < i , j , k , l < n. The key property of fourth order statistics which is used in ICA methods is th a t if the sources are independent, all th e cumulants w ith at least two different indices are zero. The cumulant tensor defines a linear transform ation F = {f i j } in the space of Nx x Nx matrices where Nx is the number of observations. The i j t h element of the m atrix given by the transform ation is defined as:

f i j { M) = ' ^ m k i C u m { x i , X j , X k , x i ) (2.21) k,l

where niki is (A:, Z)th element of m atrix M and Cum (.) is th e fourth order cumulant operator. M atrix M is an eigenmatrix of the cumulant tensor, by definition, if

F (M ) = AM (2.22)

which implies /ÿ ( M ) = Xmij and A is an eigenvalue [57]. On the other hand, for a whitened observation vector z we have:

z = V A s = W ^ s (2.23)

where is th e whitened mixing m atrix. If we assume th a t Wfc is the A:th row of W it can be proved th a t

f i j i ' ^ k ^ k ) = WkiWkjKurt{sk) (2.24)

where K urt{.) denotes the Kurtosis operator. Comparing (2.24) and (2.21) implies th a t every is an eigenmatrix of the cum ulant tensor and its corresponding eigenvalue is equal to kurtosis of the independent source vectors. So, in the separation process if the kurtoses of the independent components, which are eigenvalues of th e tensor, are distinct, every eigenmatrix will provide one of the columns of the whitened mixing m atrix. How ever, practically the eigenvalues may not be distinct and therefore, th e eigenmatrices are

(38)

the linear combinations of the matrices w ^w ^. Therefore, a secondary process is necessary to estim ate the eigenmatrices of the cumulant tensor of observations. JA D E is one of the well established m ethods to solve the problem of degenerate eigenvalues of the cumulant tensor. The eigenvalue decomposition can be viewed as diagonalization, therefore, assum ing th a t the ICA model holds the problem is approached by assuming th a t m atrix W is the separating m atrix. In this case, W diagonalizes F (M ), i.e, W F ( M ) W ^ is diagonal for any M . This is because m atrix F is a linear combination of the eigenmatrix term s Thus, we have to choose a set of different matrices M j and try to diagonalize th e matrices W F(M % )W ^ as mush as possible. In practice, it is not possible to exactly diagonalize th e set of matrices. The best choice of the matrices M j are the eigenmatrices of the tensor m atrix of the whitened data. The first Ng significant eigenpairs are usually selected. Similar to SOBI algorithm, th e diagonality of th e matrices Qj = W F (M ^ )W ^ can be measured using of f { . ) operator. Extended Jacobi technique for simultaneous di agonalization m ethod [18] is used to jointly diagonalize the matrices W F(M % )W ^. On the other hand, the algebra of higher-order tensors, called m ultilinear algebra, has some solutions for joint diagonalization using tensor decomposition m ethod known as canoni cal decomposition (CANDECOMP) [58] or parallel factors model (PARAFAC) [59]. It is shown th a t tensor factorization can also be used to tackle th e simultaneous diagonalization problem. Lathauwer [20] reported th a t there is a link between simultaneous m atrix diago nalization and tensor factorization (TF), in particular PARAFAC, and he proved th a t the unique canonical components can be obtained by simultaneous m atrix diagonalization. Tensor factorization has several advantages compared to two-way m atrix factorization such as uniqueness of the optimal solution and component identification. Furtherm ore, multidimensional structure of some data, fMRI, can be directly taken into account would otherwise be lost when analysing th e d ata by m atrix factorization approaches. Recently, in addition to signal processing, tensor decomposition is in frequent use in a m any of other fields [2].

In chemistry, tensor factorisation has been used to recover some unique chemical com pounds from sampled m ixtures using fluorescence spectroscopy model [60]. In psychomet rics it has been used to address unsupervised clustering of behaviours of different subjects in different conditions [61]. In computer vision, tensor factorisation enables the extraction of common pattern s (e.g. eigenfaces) for face recognition [62], [63]. Moreover, in

(39)

bioinfor-2.5. Multi-way data representation and factorization 18 Third mode First m ode S econd m ode

Figure 2.1: A th ird order tensor and its modes [1].

matics, tensor factorization has opened a new p ath for understanding of cellular states and biological processes [64]. More applications of tensor factorization are explored in [26]. Before exploring more about some applications of th e tensor factorisation and developing new algorithms, a brief review of th e basic concepts is provided in the following section. In this section we first introduce some basic definitions of tensor algebra w ith a closer look at the PARAFAC model and its extensions.

2.5 M ulti-w ay data representation and factorization

In the multi-way context a tensor is used to describe a set of d ata whose elements can be arranged as a m ulti dimensional or multi-way array. In analogy to a m atrix each direction in a multi-way array is called a way or a mode. A tensor w ith only one way is a first-order tensor or a vector, and with two indices is a two-way tensor or a m atrix. For a third-order tensor w ith three indices the elements can be considered in a box as shown in Figure 2.1. Similar to two-way tensors, which have rows and columns, three-way tensors have fi b res/tu bes and slices/slabs which are shown in Figure 2.2.

Each slab/slice is obtained by keeping one mode fixed. For example, for a third-order tensor shown in Figure 2.2.(b) the third mode is considered as fixed mode and in this case the slab is called a frontal slab. The slabs related to other modes are called vertical and horizontal slabs.

(40)

2.5. Multi-way data representation and factorization 19

(a) _(b)

Figure 2.2: (a) Columns, rows, and tubes of a tensor and (b) horizontal, lateral, and frontal slabs of a tensor [2].

2.5.1 Special ten sors

Similar to square, symmetric, and diagonal matrices in a two-way space there are cubical, supersymmetric, and superdiagonal tensors in a multi-way space. A cubic tensor is a tensor with the same size modes, i.e., % G [2]. A supersym m etric cubical tensor has the same elements under any perm utation of the indices, i.e., for a three-way supersymmetric tensor % G Xijk = Xikj = Xjik = Xjki = Xkij = Xkji for all i]j\ k = 1,..., I. Moreover, similar to diagonal matrices, a superdiagonal tensor X G i?A x/2x/3x.../iv

has only non-zero elements for i\ = %2 — ... = ijs[. Figure 2.3 shows one three-way

superdiagonal tensor.

Figure 2.3: A three-way super diagonal tensor [2].

2.5.2 U nfolding th e ten sors

One of the im portant concepts in multi-way analysis is unfolding or m atricization which is used for transforming a tensor into a m atrix. Unfolding is accomplished by concatenating matrices in different dimensions.