3.2 Methodology
3.2.2 Clustering
One the phase dierene matries along a spei frequeny band and time
interval are obtained, the next logial step is to investigate if there is any
underlying pattern in the phase dierenes. In order to disover signiant
patterns of features inthe phasesynhronisation obtained fromthe algorithm
desribed in the previous setion, a pattern reognition tehnique is needed.
The k-means [166℄ lustering algorithm is the most widely used partitional
lustering algorithm. It has appliations aross a broad range of data min-
ing problems [167℄ as it is one of the simplest and most eient lustering
algorithmsthat exists in the eld of data lustering.
k-means lusteringassumesthatthenumberofunderlyinglustersisknown.
It starts by randomly hoosing k points as the initial entroids. Posteriorly
eah point of the initialdataset is assigned tothe losest entroid based on a
speiproximitymeasure,widely knownasaostfuntion. Onethe lusters
are formed, the entroids for eah one of the lusters are updated. These two
steps will be iteratively repeated until the entroids do not hange any more
or ahosen onvergene riterion isahieved.
TheostfuntionseletedforthispartiularstudyisbasedontheEulidean
distane as dissimilaritymeasure. Some other proximity measures whih an
alsobeusedareManhattandistane orCosinesimilarity[168℄. Thehoiean
signiantlyaettheentroidassignmentandthequalityofthenalseletion.
In this ase Eulidean distane was seleted as it is the most popular hoie
and onsequently the more tested option [169℄.
J(θ, U) =
N
X
i=1
k
X
j=1
uijkxi−θjk
2
(3.10)Theostfuntionusedwithinthishapterisdenedasequation3.10,where
θ
=
θT
1, . . . , θTk
T
are the luster representatives or simplyrepresentatives or-
responding to points of the given dimensional spae,
k.k
stands for the Eu- lidean distane,xi
is the ith
element of the dataset
χ
={x1, x2, . . . , xN}
and uij
=1if
xi
liesloser toθj
; otherwise u ij=0 [170℄. Inthis ase the dataset
χ
is the omplete range of instantaneous phase dierenes for eah pair of EEGeletrodes asafuntionoftimeand averagedoverapartiular frequenyband
of interest alulated asexplained insetion 3.2.1.
The two major fators that an aet the k-means algorithm and on-
sequently may have animpatonitsperformaneare: the hoieof the initial
the lusters orresponding to the loal minima[170℄. To avoid this initialisa-
tion issue, several initialisationmethodologies have been proposed. Hartigan
andWongproposedamethodbasedonthenearestneighbourdensity,Milligan
used the resultsobtained by means ofagglomerativehierarhial lustering or
the popular k-means++ whih arefully selets the initialentroids following
a simple-probability approah [169℄. In this work, the riterion adopted to
irumventthis handiapistoestablisha numberofrandominitialisationsfor
eah one of the lusternumbers seleted torun the lustering algorithm. The
best resultsofthek-means algorithmforeahhoieof k are seletedfromthe
n dierent randominitialisations. The numberof randominitialisationswere
seletedas10,50and100. Thehoieof3dierentnumbersofrandomisations
were onsidered to study the inuene of the initial entroids estimation on
the nal result of the lustering algorithm.
Table3.1: k-meanslusteringalgorithmpseudoode.
k-meanslustering algorithm
1. Seletnumberoflustersrange,m=[210℄
2. Repeatforeahm i Repeatforeahn
j
(n=1to10,1to50or 1to100)
-Randominitialisationofinitial entroids
-Formlustersbyassigningeahpointtoitslosestentroid(ostfuntion
J(θ, U)
) -Re-omputetheentroids Untilonvergeneriterionismet
Seletandstoragetheminimaof
J(θ, U)
3. PlotJ(θ, U)
versusm4. Seletthem i
valueshowingthemostsigniantknee.
In order to deal with the seond problem, the seletion of the number of
lusters, aninitialrange of possiblelusters
m
= [mmin, mmax]
that an dene perfetly the data setχ
is dened [170℄. This initial range is set between 2 and 10 lusters. For eahone of this possible range of lusters, the algorithmis randomly initialised n times, alulating and saving the minimum value of
the ost funtion
J(θ, U)
. The simplest way to estimate the right number of lusters is by plotting the stored values of the ost funtion against theorrespondingnumberoflustersm. Iftheplottedgraphisshowsasigniant
loalhange,popularlyknownassigniantknee,atalusteringnumberm i
,
it an besaid that the optimalnumberof lusters forthe studied datasetwill
be m i
. The absene of a signiant knee on the graph is a lear indiator
of the non-lustering struture of the partiular dataset [170℄. Another issue
appear inthegraphiof the ostfuntion versus the numberof lustersm. In
this ase the onvention followed within the mahine learning literature is to
selet the earliest and most prominent knee as the likely one to determine
the right number of lusters [134, 154℄. The steps of the k-means lustering
algorithmdesribed inthis setion are listed in table 3.1.
Prior to the appliation of the inremental lustering algorithm to the in-
stantaneous phase dierenes dataset, a proess of unwrapping needs to be
done. It is demonstrated that the phase is irular in nature, onsequently
phase dierenes are irulartoo. Thewavelet basedinstantaneousphasedif-
ferenesshouldalwaysbebetween±
π
toavoidthisproblem[134℄. Inaddition, anormalisationproess isperformedarossallofthe eletrode pairsbymeansof the maximum and minimum values of the instantaneous phase dierene.
As a result of the normalisation proess, all the transformed values will be
within the range [0,1℄. After these unwrapping and normalisation steps, the
instantaneousphasedierenesareready tofeedintothelusteringalgorithm.
The dataset
χ
isformedfromalltheinstantaneousphase dierenesasal- ulated in 3.9,χ
=
{△ϕF
(t1),△ϕF
(t2), . . .△ϕF(tn)}
. One this dataset is unwrapped and normalized itis lustered along eah time instant t toinvest-igate the possible underlying patterns within a spei frequeny band. The
lusteringalgorithmresultsyieldarightnumberoflustersk,thoseminimising
the ostfuntion, andforeahoneoftheselusters, informationregardingthe
entroids and luster labels is saved. The luster labels with a length of n,
one labelforeahtimeinstantt, holdinformationaboutthe statetransitions.
Whereas the entroids give the averaged information for eah one of the k
states dened by the lustering algorithm[134℄.
Usingthisinformation,twotypesofgraphisanbedrawn. Ononeside,the
lusteringlabelsforthek dierentstatesexplainingthedatasetanbeplotted
versusthetimeinstants
t
={t1, t2, . . . , tn}
toexplaininwhihtemporalinstant eahstateourredandthetransitionsofsuhstatesalongtime. Ontheotherhand, the lustering entroids an be used to translatethe unique states into
topographi maps. To outline these head topographies, rstly an average of
the phase dierene matriesis alulated. As it isa symmetrial matrix the
average an be done equally, taking a row or olumn average. Eah value
of this averaged step will be assigned a olour after a normalisation proess
by means of maximum and minimum values. The assignation of the olours
is magenta tones for higher values, meaninglarger averaged phase dierene,