Clustering - Connectivity analysis from EEG phase synchronisation in emotional BCI

3.2 Methodology

3.2.2 Clustering

One the phase dierene matries along a spei frequeny band and time

interval are obtained, the next logial step is to investigate if there is any

underlying pattern in the phase dierenes. In order to disover signiant

patterns of features inthe phasesynhronisation obtained fromthe algorithm

desribed in the previous setion, a pattern reognition tehnique is needed.

The k-means [166℄ lustering algorithm is the most widely used partitional

lustering algorithm. It has appliations aross a broad range of data min-

ing problems [167℄ as it is one of the simplest and most eient lustering

algorithmsthat exists in the eld of data lustering.

k-means lusteringassumesthatthenumberofunderlyinglustersisknown.

It starts by randomly hoosing k points as the initial entroids. Posteriorly

eah point of the initialdataset is assigned tothe losest entroid based on a

speiproximitymeasure,widely knownasaostfuntion. Onethe lusters

are formed, the entroids for eah one of the lusters are updated. These two

steps will be iteratively repeated until the entroids do not hange any more

or ahosen onvergene riterion isahieved.

TheostfuntionseletedforthispartiularstudyisbasedontheEulidean

distane as dissimilaritymeasure. Some other proximity measures whih an

alsobeusedareManhattandistane orCosinesimilarity[168℄. Thehoiean

signiantlyaettheentroidassignmentandthequalityofthenalseletion.

In this ase Eulidean distane was seleted as it is the most popular hoie

and onsequently the more tested option [169℄.

J(θ, U) =

N

X

i=1

k

X

j=1

uijkxi−θjk

2

(3.10)

Theostfuntionusedwithinthishapterisdenedasequation3.10,where

θ

=

θT

1, . . . , θTk

T

are the luster representatives or simplyrepresentatives or-

responding to points of the given dimensional spae,

k.k

stands for the Eu- lidean distane,

xi

is the i

element of the dataset

χ

={x1, x2, . . . , xN}

and u

=1if

xi

liesloser to

θj

; otherwise u ij

=0 [170℄. Inthis ase the dataset

χ

is the omplete range of instantaneous phase dierenes for eah pair of EEG

eletrodes asafuntionoftimeand averagedoverapartiular frequenyband

of interest alulated asexplained insetion 3.2.1.

The two major fators that an aet the k-means algorithm and on-

sequently may have animpatonitsperformaneare: the hoieof the initial

the lusters orresponding to the loal minima[170℄. To avoid this initialisa-

tion issue, several initialisationmethodologies have been proposed. Hartigan

andWongproposedamethodbasedonthenearestneighbourdensity,Milligan

used the resultsobtained by means ofagglomerativehierarhial lustering or

the popular k-means++ whih arefully selets the initialentroids following

a simple-probability approah [169℄. In this work, the riterion adopted to

irumventthis handiapistoestablisha numberofrandominitialisationsfor

eah one of the lusternumbers seleted torun the lustering algorithm. The

best resultsofthek-means algorithmforeahhoieof k are seletedfromthe

n dierent randominitialisations. The numberof randominitialisationswere

seletedas10,50and100. Thehoieof3dierentnumbersofrandomisations

were onsidered to study the inuene of the initial entroids estimation on

the nal result of the lustering algorithm.

Table3.1: k-meanslusteringalgorithmpseudoode.

k-meanslustering algorithm

1. Seletnumberoflustersrange,m=[210℄

2. Repeatforeahm i Repeatforeahn

(n=1to10,1to50or 1to100)

-Randominitialisationofinitial entroids

-Formlustersbyassigningeahpointtoitslosestentroid(ostfuntion

J(θ, U)

) -Re-omputetheentroids

Untilonvergeneriterionismet

Seletandstoragetheminimaof

J(θ, U)

3. Plot

J(θ, U)

versusm

4. Seletthem i

valueshowingthemostsigniantknee.

In order to deal with the seond problem, the seletion of the number of

lusters, aninitialrange of possiblelusters

m

= [mmin, mmax]

that an dene perfetly the data set

χ

is dened [170℄. This initial range is set between 2 and 10 lusters. For eahone of this possible range of lusters, the algorithm

is randomly initialised n times, alulating and saving the minimum value of

the ost funtion

J(θ, U)

. The simplest way to estimate the right number of lusters is by plotting the stored values of the ost funtion against the

orrespondingnumberoflustersm. Iftheplottedgraphisshowsasigniant

loalhange,popularlyknownassigniantknee,atalusteringnumberm i

it an besaid that the optimalnumberof lusters forthe studied datasetwill

be m i

. The absene of a signiant knee on the graph is a lear indiator

of the non-lustering struture of the partiular dataset [170℄. Another issue

appear inthegraphiof the ostfuntion versus the numberof lustersm. In

this ase the onvention followed within the mahine learning literature is to

selet the earliest and most prominent knee as the likely one to determine

the right number of lusters [134, 154℄. The steps of the k-means lustering

algorithmdesribed inthis setion are listed in table 3.1.

Prior to the appliation of the inremental lustering algorithm to the in-

stantaneous phase dierenes dataset, a proess of unwrapping needs to be

done. It is demonstrated that the phase is irular in nature, onsequently

phase dierenes are irulartoo. Thewavelet basedinstantaneousphasedif-

ferenesshouldalwaysbebetween±

π

toavoidthisproblem[134℄. Inaddition, anormalisationproess isperformedarossallofthe eletrode pairsbymeans

of the maximum and minimum values of the instantaneous phase dierene.

As a result of the normalisation proess, all the transformed values will be

within the range [0,1℄. After these unwrapping and normalisation steps, the

instantaneousphasedierenesareready tofeedintothelusteringalgorithm.

The dataset

χ

isformedfromalltheinstantaneousphase dierenesasal- ulated in 3.9,

χ

=

{△ϕF

(t1),△ϕF

(t2), . . .△ϕF(tn)}

. One this dataset is unwrapped and normalized itis lustered along eah time instant t toinvest-

igate the possible underlying patterns within a spei frequeny band. The

lusteringalgorithmresultsyieldarightnumberoflustersk,thoseminimising

the ostfuntion, andforeahoneoftheselusters, informationregardingthe

entroids and luster labels is saved. The luster labels with a length of n,

one labelforeahtimeinstantt, holdinformationaboutthe statetransitions.

Whereas the entroids give the averaged information for eah one of the k

states dened by the lustering algorithm[134℄.

Usingthisinformation,twotypesofgraphisanbedrawn. Ononeside,the

lusteringlabelsforthek dierentstatesexplainingthedatasetanbeplotted

versusthetimeinstants

t

={t1, t2, . . . , tn}

toexplaininwhihtemporalinstant eahstateourredandthetransitionsofsuhstatesalongtime. Ontheother

hand, the lustering entroids an be used to translatethe unique states into

topographi maps. To outline these head topographies, rstly an average of

the phase dierene matriesis alulated. As it isa symmetrial matrix the

average an be done equally, taking a row or olumn average. Eah value

of this averaged step will be assigned a olour after a normalisation proess

by means of maximum and minimum values. The assignation of the olours

is magenta tones for higher values, meaninglarger averaged phase dierene,

In document Connectivity analysis from EEG phase synchronisation in emotional BCI (Page 73-76)