E
S
E
A
R
C
H
R
E
P
R
O
R
T
I
D
I
A
P
Rue du Simplon 4
IDIAP Research Institute
1920 Martigny − Switzerland
www.idiap.ch
Tel: +41 27 721 77 11
Email: [email protected]
P.O. Box 592
Fax: +41 27 721 77 12
Generative Temporal ICA for
Classifi ation in
Asyn hronous BCI Systems
Silvia Chiappa and David Barber
IDIAPRR 05-08
Generative Temporal ICA for Classifi ation in
Asyn hronous BCI Systems
Silvia Chiappa and DavidBarber
February 2005
Abstra t.InthispaperweinvestigatetheuseofatemporalextensionofIndependentComponent
Analysis(ICA)for thedis rimination ofthreementaltasks for asyn hronousEEG-basedBrain
ComputerInterfa e systems. ICA is most ommonly usedwith EEGfor artifa t identi ation
withlittleworkontheuseofICAfordire tdis riminationofdierenttypesofEEGsignals. In
are entworkwehaveshownthat,byviewingICAasagenerativemodel,we anuseBayes'rule
toforma lassierobtainingstate-of-the-artresultswhen omparedtomoretraditionalmethods
basedon using temporalfeatures as inputs to o-the-shelf lassiers. However, inthat model
noassumptiononthe temporalnatureof theindependent omponentswas made. Inthis work
wemodelthehidden omponentswithanautoregressivepro ess inordertoinvestigatewhether
1 Introdu tion
EEG-basedBrainComputerInterfa e(BCI) systemsallowaperson to ontrol devi esby using
ob-served ele tri al a tivity
v
j
t
, at timet
, re orded by ele trodes pla ed over the s alp at lo ationsj = 1, . . . , V
. In the ase of systemsbased onspontaneous brain a tivity, the user on entratesondierentmentaltasks(e.g. imaginationofhandmovement)whi hareasso iatedwithdierentdevi e
ommands. Tasksarenormallysele tedsothattask-dependentareasinthebrainbe omea tive. The
mostprominent hara terizationofa tivityistheattenuationofrhythmi omponents,mostlyinthe
α
band. Standardapproa hesextra tthefrequen y ontentofthesignal,whi histhenpro essedbyastati lassier(see[12℄forageneralintrodu tiononBCIresear h).
Signalsre ordedats alpele trodesare ommonly onsideredasalinearandinstantaneous
super-positionofunobservedorhiddenele tromagneti a tivity
h
i
t
generatedbyindependentbrainpro esses,i = 1, . . . , H
. For these reasons Independent Component Analysis (ICA) [6℄ seemsan appropriatemodel ofEEGsignalsand hasbeenextensivelyappliedtorelatedtasks, su h astheidenti ationof
artifa ts([7,11℄)andtheanalysisoftheunderlyingbrainsour es.
Morespe i ally relatedto BCIresear h, severalstudies haveaddressedthe issueof whetheran
ICA de omposition an enhan edieren es in themental taskssu hasto improvethe performan e
ofbrain-a tuatedsystems. Mostofthesestudiesusestati versionsofICAeitherasaformof
prepro- essing,ortoaidanalysisofthesignal. In ontrasttoourapproa hbelow,theydonotuseICAitself
to dire tly form a lassier. In [8℄, theauthors analyze avisual attentiontask andshowthat ICA
nds
µ
- omponentswhi hshowaspe tralrea tivitytomotoreventsstrongerthantheone measuredfroms alp hannels. TheysuggestthatICA anbeusedforoptimizingbrain-a tuated ontrol. In[3℄
ICA is used foranalyzing EEGdata re ordedfrom subje tswhi h attempt to regulatepowerat 12
Hz overthe left-right entral s alp. Other studiesuse ICA asadenoising te hnique orasafeature
extra torfor improvingthe performan e of aseparate lassier. Forexample, in [4℄ ICA is used to
removeo ularartefa ts,while[5℄extra tstask-relatedindependent omponentspriortheappli ation
of several lassiers. In ontrasttothese approa hes,in [10℄ theauthorsintrodu ea ombinationof
HiddenMarkovModelsandIndependentComponentAnalysisasagenerativemodeloftheEEGdata
andgiveademonstrationofhowthismodel anbeapplieddire tlytothedete tionofwhenswit hing
o ursbetweenthetwomental onditionsofbaselinea tivityandimaginarymovement.
Followingasimilarapproa h,inare entwork[2℄wehaveuseddire tlyasimplestati ICA
genera-tivemodelofEEGsignalsasa lassierforthere ognitionofthreementaltasks. Wehaveshownthat
aperforman e similarto standardapproa hesbasedonusing temporal featuresasinputsto
o-the-shelf lassiers anbeobtained. Itisstillanopenquestionwhetherwe andobetterbyusingamore
omplexmodel ofthedata,sin ein [2℄thetemporal natureoftheindependent omponentswasnot
takeninto a ount. Temporal modeling of thehidden omponents, for examplewith autoregressive
models[9℄,hasshowntoimproveseparationinthe aseofothertypesofre ordings.
In this paper we further investigate the use of ICA for lassi ation by modeling ea h hidden
omponentwithanautoregressivepro ess. Ourinterestistoassesperforman einexperimentswhi h
are loseto thereal useofaBCIsystem. Rather thanusing asyn hronousproto ol,in oursystem
thesubje tperformsrepetitivemovementsandwordgenerationinaself-pa edmanner,withoutbeing
syn hronizedtoanexternal ue.
Ourapproa histot,forea hperson,anICAgenerativemodelto ea hseparatetask,andthenuse
Bayes'ruletoformdire tlya lassier. Thismodelwillbe omparedwithitsstati spe ial ase,where
notemporalinformationis takeninto a ount,andwithtwostandardte hniquesforthere ognition
of mental tasks: theMultilayerPer eptron(MLP) andSupport Ve torMa hine(SVM) [1℄, trained
withpowerspe traldensityfeatures.
2 Generative Temporal Independent Component Analysis
Generative Independent Component Analysis is aprobabilisti model in whi h ave tor of
h1
t
−1
h1
t
h1
t+1
. . .h
H
t
−1
h
H
t
h
H
t+1
vt
−1
vt
vt+1
Figure 1: Graphi al representationof anICA model withtemporaldependen e betweenthehidden
variables(oforder
p = 1
).instantaneouslineartransformation:
v
t
= W h
t
+ ǫ
t
,
where
ǫ
t
is noise. Forreasons of tra tability, in ourmodel (andothers in the literature)ǫ
t
will beassumedtobezerothroughout, and
W
willbeassumedto beasquarematrix.Like in Contextual ICA [9℄ and HMMICA [10℄, we assume temporal dependen e between the
hiddenvariables
h
t
bymodelingthei
th
hiddenbrainpro ess
h
i
t
withalinearautoregressivemodeloforder
p
:h
i
t
=
p
X
k=1
a
i
k
h
i
t−k
+ η
i
t
= ˆh
i
t
+ η
i
t
,
whereη
i
t
is the noise term. Graphi ally, the Bayesian network whi h orresponds to this model isshowninFig. 1.
Ouraim willbetot amodeloftheaboveform toea h lassof task
c
. Inorder todothis, wewilldes ribe the model as ajointprobability distribution, and use maximum likelihood asthe training
riterion.
Given theaboveassumptions, we anfa torize the density of the observedand hidden variables as
follow 1 :
p(v
1:
T
, h
1:
T
|c) =
T
Y
t=1
p(v
t
|h
t
, c)
H
Y
i=1
p(h
i
t
|h
i
t−1:t−p
, c) .
(1)Using
p(v
t
|h
t
) = δ(v
t
− W h
t
)
we an easily integrate (1) over the hidden variablesh
t
to form thelikelihoodoftheobservedsequen e
v
1:
T
:p(v
1:
T
|c) = | det W
c
|
−T
T
Y
t=1
H
Y
i=1
p(h
i
t
|h
i
t−1:t−p
, c) ,
(2) whereh
t
= W
−1
c
v
t
. Wewillmodelp(h
i
t
|h
i
t−1:t−p
, c)
withthegeneralizedexponentialdistribution:p(h
i
t
|h
i
t−1:t−p
, c) =
f (α
ic
)
σ
ic
exp
− g(α
ic
)
h
i
t
− ˆ
h
i
t
σ
ic
α
ic
,
wheref (α
ic
) =
α
ic
Γ(3/α
ic
)
1
/2
2Γ(1/α
ic
)
3
/2
,
g(α
ic
) =
Γ(3/α
ic
)
Γ(1/α
ic
)
α
ic
/2
1Figure 2: Generalized exponential distributionfor
α = 2
(solidline),α = 1
(dashedline) andα =
100
(dottedline),whi h orrespondtoGaussian,Lapla ianandapproximatelyuniformdistributionsrespe tively.
and
Γ(·)
istheGammafun tion. Thegeneralizedexponentialfamilyen ompassesmanytypesofsym-metri andunimodaldistributions. Theparameter
σ
isthestandarddeviation2
,while
α
determinesthesharpnessofthedistribution,asshownin Fig. 2.
The logarithm of the likelihood (2) is summed over all training sequen es belonging to ea h lass
and then maximized by using the s aled onjugategradient method des ribed in [1℄. This requires
omputing the derivatives with respe t to all the parameters, that is, the mixing matrix
W
c
, theautoregressive oe ients
a
i
k
, and the parameters of the exponential distributionσ
ic
and
α
ic
(seeAPPENDIX).
Aftertraining,anoveltestsequen e
v
∗
1:
T
is lassiedusingBayes'rulep(c|v
∗
1:
T
) ∝ p(v
1:
∗
T
|c)
,assumingp(c)
isuniform.3 Experimental Setup
EEGpotentialswerere ordedwiththeBiosemi A tiveTwosystem(http://www.biosemi. om),using
32ele trodeslo atedatstandardpositionsofthe10-20InternationalSystem,atasamplerateof512
Hz. The raw potentials were re-referen edto the Common Average Referen e in whi h theoverall
meanisremovedfromea h hannel. Subsequently,theband6-16Hzwassele tedwithaButterworth
lter. This prepro essing lteris a simplewayto removestrong driftterms in the signals(the
so- alled DC level)and the50 Hz noise,whi h are artifa tsof instrumentation and donot orrespond
to braina tivity. Experimentally,wealsofound thatremovingfrequen iesoutsidetheband 6-16Hz
robustied the performan e. Only the following 19ele trodes were onsidered for theanalysis: F3,
FC1,FC5,T7,C3,CP1,CP5,P3,Pz,P4,CP6,Cp2,C4,T8,FC6,FC2,F4, FzandCz.
Thedata were a quired in an unshielded roomfrom twohealthy subje tswithout any previous
experien e withBCIsystems. Duringaninitial daythesubje tslearnedhowto performthemental
tasks. In the following two days, 10 re ordings, ea h lasting around 4 minutes, were a quired for
the analysis. Duringea hre ording session,every20 se onds anoperator instru ted thesubje tto
perform oneof three dierent mental tasks. The tasks were: (1) imagination of self-pa ed left, (2)
righthand movementand(3)mental generationofwordsstartingwithagivenletter.
4 Results
The time series obtained from ea h re ording session was split into segments of signal lasting one
se ond. ICAwas omparedwithtwostandardapproa hes,inwhi hforea hsegmentthepower
spe -tral densitywasextra tedandthenpro essedusinganMLPandaSVM.Thebest performan ewas
obtainedusing thefollowingWel h's periodogrammethod: ea h pattern wasdividedinto aquarter
ofse ondlongwindowswithanoverlapof1/8ofse ond. Thentheoverallaveragewas omputed.
2
Duetotheindetermina yofvarian eofthe
h
i
(
h
i
anbemultipliedbyas alingterm
a
aslongasthe orrespondingolumnof
W
c
ismultipliedby1/a
),σ
ouldbesettooneinthe generalmodeldes ribedabove. Howeverthis annotTherstthreesessionsofea hdaywereusedfortrainingthemodelswhiletheothertwosessions
whereused alternativelyforvalidation andtesting.
A softmax, onehidden layer MLP wastrained using ross-entropy, with the validation set used to
hoose thenumber of iterations, thenumber of
tanh
hidden units (rangingfrom 1to 100) and thelearningrateofthegradientas entmethod.
Inthe SVM,ea h lass wastrainedagainsttheothers, andthestandarddeviation fortheGaussian
SVMfoundusingthevalidationset (rangingfrom1to 20000).
In theICA model, for omputationalexpedien yonly, the data were down-sampled from 512to 64
samplesperse ond. Thevalidationsetwasusedto hoosethenumberof onjugategradientiterations
and theorder
p
of theautoregressivemodel (from1to 8), evenif wehaveobservedthat theappro-priateorderdoesnot hangefordierentsessions. Sin eweassumethatthes alpsignalisgenerated
bylinearmixing ofsour esinthe ortex,providedthedataarea quiredunder thesame onditions,
itwouldseemreasonabletofurtherassumethat themixingisthesameforall lasses(
W
c
≡ W
)andthis onstrainedversionisalso onsidered.
A omparisonoftheperforman eisshowninTable1. BesidestheresultsobtainedwiththeT
em-poralICA model (T.ICA), inwhi h theindependent omponentsare modeledby anautoregressive
pro ess, wepresenttheresults obtainedwitha Stati ICA model (S. ICA), whi h an be seenasa
parti ular aseinwhi htheautoregressiveorder
p
issettozero. Classi ationismeasuredonaround420testexamples.
ICA onsistentlyperformsaswellasthetemporalfeatureapproa husingMLPandSVMs. However,
by modeling the independent omponents with an autoregressivepro ess we don't obtain
improve-mentsindis riminationwithrespe ttothestati ase.
ForSubje tA,weusedthethird day'sdatato sele tthethree hidden omponentswhose
distri-bution variedmost a rossthe three lasses, using the ICA model with amatrix
W
ommon to alllasses. IntheStati ICAmodel, thethree omponentswere sele tedbylooking at thedistribution
p(h
i
t
)
,whileintheTemporalICAmodeltheyweresele tedbylookingatthe onditionaldistributionp(h
i
t
|h
i
t−1:t−p
)
forthe orderp
thatgavethebest performan e inthetest set. Theproje tionof ea homponentonthe19s alpele trodes(
i
th
olumnof
W
)givesanindi ationofwhi hpartofthes alpre eivedmorea tivity from that omponent. Thes alpproje tionsand time ourses(300 framesof
thewordtask)ofthesele tedhidden omponentsareshowninFig. 3. Aswe anseefromthe
proje -tions,thereisa orresponden ebetweenthestati omponents(s1, s2,s3)andtemporal omponents
(t1, t2, t3). Thetime oursesare alsoverysimilar. Ingeneralwehavefound ahigh orresponden e
among almost all the19 omponents of the Stati and Temporal ICA model. The omponentsfor
whi h a orresponden e wasnot found don't show dieren esin the autoregressive oe ientsand
in the onditionaldistribution,thusarenotrelevantfordis rimination. Finallynotethat thehidden
omponentsfoundbytheTemporalICA don'tlooksmootheraswewouldexpe t.
Subje tA Subje tB
Day2 Day3 Day2 Day3
S.ICA
W
40.0% 34.8% 28.5% 31.5% T.ICAW
40.2% 36.7% 27.8% 30.8% S.ICAW
c
37.1% 36.0% 25.6% 30.8% T.ICAW
c
38.8% 36.2% 27.1% 28.2% MLP 37.1% 38.1% 30.5% 34.2% SVM 35.1% 38.1% 32.4% 36.6%Table1: Classi ationserrorsforthreementaltasksusingStati ICA,TemporalICA,MLPandSVM.
Comp.
s
1
Comp.s
2
Comp.s
3
Comp.t
1
Comp.t
2
Comp.t
3
0
100
200
300
0
100
200
300
0
100
200
300
0
100
200
300
0
100
200
300
0
100
200
300
Figure3: Proje tiononthes alpofthreehidden omponentsforSubje tA,Day3usingStati ICA
(Comp. s1,Comp. s2,Comp. s3)andTemporalICA(Comp. t1,Comp. t2,Comp. t3)(Fromblueto
red,negativetopositivevalues). Thetopographi plotshavebeenobtainedbyinterpolatingthevalues
attheele trode(bla kdots)usingtheopensour eeeglabtoolbox(http://www.s n.u sd.edu/eeglab).
Belowthe proje tions,time ourses (300 frames) of the orresponding hidden omponents. Due to
theindetermina yof varian eofthehidden omponents,axes s alebetweendierentgures annot
be ompared andhasbeenremoved. Thisalsoappliestotheabsolutes alpproje tion.
5 Con lusions
Inthis workwehavepresented apreliminaryanalysis onthe useof asimpletemporalIndependent
ComponentAnalysismodelforthedis riminationofthreementaltasksforasyn hronousEEG-based
BCIsystems. Unlikestandardstati ICA, whi hassumestemporalindependen e ofthehidden
om-ponents,wehavemodeledea h omponentwithanautoregressivepro ess. Whilethis approa hhas
beensu essfullyappliedtotheseparationofsour esnotwellseparableusingstati ICA,itdoesnot
seemtobringadditionaldis riminantinformationwhenICAisusedasagenerativemodelfordire t
whi h anhandle hangesofregimeintheEEGdynami s.
A knowledgment
ThisworkwassupportedbytheSwissNSFthroughtheNCCRIM2andbythePASCALNetworkof
Ex ellen e,IST-20002-506778,fundedin partbytheSwissOFES.
Theauthorswould liketoa knowledgeDr. S. BengioandC.Dimitrakakisforusefuldis ussions.
APPENDIX
Herewewritethenormalizedlog-likelihoodofasetof
L(c) =
1
S
c
(T − p)
S
c
X
s=1
log p(v
p+1:T
s
|h
s
1:
p
, c) ,
where
s
indi ates thes
th
trainingpatternof lass
c
. We writep(v
s
p+1:T
|h
s
1:p
, c)
, ratherthanthe notationalabuse
p(v
s
1:T
|c)
in the main text, sin e this takes are of the initial time steps whi h would otherwise beproblemati . Inthefollowing,
h
s
t
= W
c
−1
v
s
t
,fort = 1, . . . , T
. Wewant tomaximizeL =
P
c
L(c)
. Droppingthepatternindex
s
,the omponentindexi
andthe lassindexc
wehave:∂L
∂σ
= −
1
σ
+
g(α)α
S(T − p)(σ)
α+1
S
X
s=1
T
X
t=p+1
|h
t
− ˆ
h
t
|
α
,
thatisthemaximumlikelihoodsolutionis:
σ =
“
g(α)α
S(T − p)
S
X
s=1
T
X
t=p+1
|h
t
− ˆ
h
t
|
α
”
1/α
.
Usingthissolutionweobtain:
∂L
∂α
=
1
α
+
1
α
2
Γ(1/α)
′
Γ(1/α)
+
1
α
2
log
“
α
P
S
s=1
P
T
t=p+1
|h
t
− ˆ
h
t
|
α
S(T − p)
”
−
P
S
s=1
P
T
t=p+1
|h
t
− ˆ
h
t
|
α
log |h
t
− ˆ
h
t
|
α
P
S
s=1
P
T
t=p+1
|h
t
− ˆ
h
t
|
α
.
SettingA = W
−1
:∂L
∂A
= −
1
S(T − p)
S
X
s=1
T
X
t=p+1
b
t
v
t
′
+
1
S(T − p)
S
X
s=1
T
X
t=p+1
ˆ
B
t
+ (A
′
)
−1
,
where
b
t
isave torofelementsb
i
t
=
g(α
i
)α
i
(σ
i
)
α
i
sign(h
i
t
− ˆ
h
i
t
)|h
i
t
− ˆ
h
i
t
|
α
i
−1
and
B
ˆ
t
isamatrixofrowsˆ
B
i
t
=
g(α
i
)α
i
(σ
i
)
α
i
sign(h
i
t
− ˆ
h
i
t
)|h
i
t
− ˆ
h
i
t
|
α
i
−1
p
X
k=1
a
i
k
v
t−k
′
.
Finally,thederivativewithrespe ttotheautoregressive oe ient
a
k
is:∂L
∂a
k
=
g(α)α
S(T − p)(σ)
α
S
X
s=1
T
X
t=p+1
sign(h
t
− ˆ
h
t
)|h
t
− ˆ
h
t
|
α−1
h
t−k
.
Referen es
[1℄ C.M.Bishop. NeuralNetworks forPattern Re ognition. OxfordUniv.Press,1995.
[2℄ S. ChiappaandD. Barber. Generativeindependent omponentanalysis forEEG lassi ation.
Te hni alReportIDIAP-RR77,IDIAP Resear hInstitute,Martigny,Switzerland,2004.
[3℄ A.DelormeandS.Makeig.EEG hangesa ompanyinglearnedregulationof12-HzEEGa tivity.
IEEETrans.Neural Syst. Rehabil.Eng.,11:133137,2003.
[4℄ T.Hoya,G.Hori,H.Bakardjian,T.Nishimura,T.Suzuki,Y.Miyawaki,A.Funase,andJ.Cao.
Classi ation of single trial EEG signals by a ombined prin ipal + independent omponent
analysisandprobabilisti neuralnetworkapproa h. InInternationalSymposiumon Independent
Component AnalysisandBlindSignalSeparation,pages197202,2003.
[5℄ C.-IHung,P.-L.Lee,Y.-T.Wu,H.-Y.Chen,L.-FChen,T.-C.Yeh,andJ.-C.Hsieh. Re ognition
of motor imagery ele troen ephalography using independent omponent analysis and ma hine
lassiers. InInternationalConferen e onComputer Graphi s, VisualizationandComputer
Vi-sion,2004.
[6℄ A.Hyvärinen,J.Karhunen,andE.Oja.IndependentComponentAnalysis.JohnWileyandSons,
In .,2001.
[7℄ T.P.Jung,C.Humphries,T.W. Lee,S.Makeig,M.J.M Keown,V. Iragui,andT.Sejnowski.
Extendend ICA removes artifa ts from ele troen ephalogra re ordings. Advan es in Neural
InformationPro essingSystems,pages894900,1998.
[8℄ S. Makeig, S. Engho, T.-P. Jung, and T. J. Senjnowski. A natural basis for e ient
brain-a tuated ontrol. IEEE Transa tions onRehabilitation Engineering,8:208211,2000.
[9℄ B. A. Pearlmutter and L. C. Parra. Maximum likelihood blind sour e separation: A
ontext-sensitive generalizationof ICA. In Advan es in Neural Information Pro essing Systems, pages
613619,1997.
[10℄ W. D. Penny, S. J. Roberts, and R. M.Everson. HiddenMarkov independent omponentsfor
biosignalanalysis. In International Conferen eon Advan es InMedi al Signal andInformation
Pro essing, pages244250,2000.
[11℄ R. Vigário. Extra tion of o ular artefa ts from EEG using independent omponents analysis.
Ele troen ephalography andClini al Neurophysiology, 103:395404,1997.
[12℄ J. R. Wolpaw, N. Birbaumer, D. J. M Farland, G. Pfurts heller, and T. M. Vaughan.