Deep Long Short-term Memory Structures Model Temporal Dependencies Improving Cognitive Workload Estimation

(1)

AFIT Scholar

Faculty Publications

7-15-2017

Deep Long Short-term Memory Structures Model Temporal

Dependencies Improving Cognitive Workload Estimation

Ryan G. Hefron

Air Force Institute of Technology

Brett J. Borghetti

Air Force Institute of Technology

James C. Christensen

Air Force Research Laboratory

Christine M. Schubert Kabban

Air Force Institute of Technology

Follow this and additional works at:

https://scholar.afit.edu/facpub

Part of the

Cognitive Neuroscience Commons

Recommended Citation

Hefron, R. G., Borghetti, B. J., Christensen, J. C., & Schubert Kabban, C. M. (2017). Deep long short-term

memory structures model temporal dependencies improving cognitive workload estimation. Pattern

Recognition Letters, 94(15 July), 96–104. https://doi.org/10.1016/j.patrec.2017.05.020

This Article is brought to you for free and open access by AFIT Scholar. It has been accepted for inclusion in

Faculty Publications by an authorized administrator of AFIT Scholar. For more information, please contact

(2)

ContentslistsavailableatScienceDirect

Pattern

Recognition

Letters

journalhomepage:www.elsevier.com/locate/patrec

Deep

long

short-term

memory

structures

model

temporal

dependencies

improving

cognitive

workload

estimation

Ryan

G. Hefron

a,∗

_,

_Brett

_J.

_Borghetti

a

_,

_{James C.}

_Christensen

b

_,

Christine

M. Schubert

Kabban

c

a_Deptpartment_of_Electrical_&_Computer_Engineering,_Air_Force_Institute_of_Technology,_WPAFB,_OH_45433,_USA b_Air_Force_Research_Laboratory,_WPAFB,_OH_45433,_USA

c_Deptpartment_of_Mathematics_&_Statistics,_Air_Force_Institute_of_Technology,_WPAFB,_OH_45433,_USA

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received30November2016 Availableonline20May2017

MSC: 41A05 41A10 65D05 65D17 Keywords:

Psychophysiologicalworkloadestimation EEG

Electroencephalograph LSTM

Longshort-termmemory Temporalnonstationarity Temporaldependence Day-to-dayvariability Time-seriesanalysis Recurrentneuralnetwork Operatorfunctionalstateassessment Human-machineteams

a

b

s

t

r

a

c

t

Usingdeeply recurrentneuralnetworks toaccountfortemporal dependenceinelectroencephalograph (EEG)-basedworkloadestimationis showntoconsiderablyimprove day-to-dayfeaturestationarity re-sultinginsignificantlyhigheraccuracy(p<.0001)thanclassifierswhichdonot considerthetemporal dependenceencodedwithintheEEGtime-seriessignal.Thisimprovementisdemonstratedbytraining severaldeepRecurrentNeuralNetwork(RNN)modelsincludingLongShort-TermMemory(LSTM) archi-tectures,afeedforwardArtificialNeuralNetwork(ANN), andSupportVectorMachine(SVM)modelson datafromsixparticipantswhoeachperformseveralMulti-AttributeTaskBattery(MATB)sessionsonfive separatedaysspreadoutoveramonth-longperiod.Eachparticipant-specificclassifieristrainedonthe firstfourdaysofdataandtestedusingthefifth’s.Averageclassificationaccuracyof93.0%isachieved us-ingadeepLSTMarchitecture.Theseresultsrepresenta59%decreaseinerrorcomparedtothebest previ-ouslypublishedresultsforthisdataset.Thisstudyadditionallyevaluatesthesignificanceofnewfeatures: allcombinationsofmean,variance,skewness,andkurtosisofEEGfrequency-domainpowerdistributions. Meanandvariancearestatisticallysignificantfeatures,whileskewnessandkurtosisarenot.Theoverall performanceofthisapproachishighenoughtowarrantevaluationforinclusioninoperationalsystems.

PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBYlicense.(http://creativecommons.org/licenses/by/4.0/ )

1. Introduction

Teamscomposedofbothhumansandmachinescanpotentially work together to mitigate their respective inherent weaknesses. Acomputer’s strength is manifested in its ability to quickly and correctlycompute answers, while humans exhibit superior ﬂexi-bilityofresponsetounexpectedsituations.Thus,Human-Machine Teams(HMTs)promisetomitigateinherentlimitationson compu-tationaldecision-makinginall-humanteamswhilesimultaneously reducingthe brittlenessandinﬂexibility offully-autonomous sys-tems[11].Team outcomesareimprovedwhen oneagent(human or computer) assists another in the right way at the right time

[7].Forcomputerstohelphumans inHMTs,they mustknowthe human’scognitive state;thisknowledge can beobtainedthrough

∗ _{Corresponding}_author.

E-mailaddress:ryan.hefron@aﬁt.edu(R.G.Hefron).

operator functional state assessment (OFSA) [45]. Several meth-odsofOFSAexist,whichcangenerallybebrokenintotwoclasses ofmeasures–objectiveandsubjective. Subjectivemeasuresusually asktheoperatortoevaluatethemselveseitherduringorafterthe task,whileobjectivemeasures usea physiologicalsensor suchas electroencephalograph(EEG)orelectrocardiogram(ECG)toprovide inputstoanalgorithmthatassessestheoperator’sfunctionalstate. The beneﬁt of objective measures is that they do not interrupt the operator while performing the task[41,42]. Continuous non-interruptingstateassessmentisan importantcharacteristicfor vi-ableHMTsoutsidethelaboratory.

AkeysubareaofresearchwithinOFSAismentalworkload esti-mation.Enablingthemachineinahuman-machineteamto unob-trusively and continuously ascertain the operator’s mental work-load is the ﬁrst step in closing the machine-to-human augmen-tationloop. Inorder foraugmentationto be effective,itmust be drivenbyanaccurateestimateofmentalworkload[7].Acommon

http://dx.doi.org/10.1016/j.patrec.2017.05.020

(3)

method for estimating mental workload is to ﬁrst use statistical machinelearningtoﬁtamodelwhichenablespredictionofmental workloadfromthephysiologicalsignals,andthenusethatmodel to makementalworkload estimatesfromnewly-gathered physio-logicalsignals[44].

TheutilityofanOFSAsystemwilldependonthebenefitsof ac-curate assessmentandthecostsoferrors.Thiscost-benefit trade-off willbeapplication-specificanddifferentforcorrectly identify-ing highandlowworkloadstatesdependingonthetypesof aug-mentation tied to a given state and the consequences of incor-rect/inappropriateactivationorlackofactivation.Theseerrors di-rectlyimpactanoperator’strustintheautomation,in-turn affect-ingfutureutilityofthatautomationinaclosedloop-fashion[28]. Rouse et al.[35] indicated that a 95% accuracy rate forworkload estimation may be required fora systemto be acceptable. Para-suramanetal.[31]wentfurtherandsuggestedthatifthesystem doesnotapproach100%accuracythenthecostsofinaccuracyand lackoftrustmayleadtothesystembeingunacceptable,especially insafety-criticalenvironments.

Unfortunately,currentstate-of-the-artsystemsarenotyetable to achieve the requiredaccuracy, due inpart to thechallenge of temporalnon-stationarityinpsychophysiologicalsignals.This chal-lengerelates tovariation overlonger periodsoftime and depen-dencewithinshorterperiods.Bothcannegativelyimpactthe gen-eralizablelongtermaccuracyofworkloadassessmentsystems[7]. Withinshorterspansoftime,signalstendtoexhibithysteresis or serialdependence.Thissuggeststhatthereisinherentstructurein thestatefulness inthebrainthatcan beexploitedwith appropri-ate machine learningtechniques. While it isdiﬃcult to attribute thisdependence toanydiscrete set offactors,some of thelikely possibilitiesincludeconsistencyindefaultmode activity[34] and hysteresisexhibitedbymostphysiologicalsystems.

In the context ofmachine learning, temporal non-stationarity canbeaddressedintwoways.Thefirstisthroughfeature genera-tionorselection.Abettersetoffeatureswillexhibitlesslong-term non-stationarityandwillleadtobettermodelperformance.Inthis work, weexamineseveralfeaturegeneration techniquesto deter-mineempiricallyifcertain featuresetsaresuperiortoothers.The secondwaytoaddressnon-stationaritywithmachinelearningisto usealgorithms that make differentassumptions aboutthenature ofthedatabeingprocessed.Asitstands,mostpublishedresearch on operatorworkloadestimation implicitlyassumedtemporal in-dependence from one time segment to the next. This is likely a poorassumption dueto boththe factorsdiscussed above aswell aslongertermeffects suchasfatigueandperformance hysteresis withmentalworkload transitions[22].An examplefromaviation illustrates this nicely. If a pilot has just completed flying an in-strumentapproach ininstrumentmeteorologicalconditions(IMC) whenanunexpectedemergencyrequiresattention,pilotworkload willincreasedifferentlythanifthepilothadthesameunexpected emergencyarisefollowingaperiodofautopilot-onflightat cruis-ing altitudein visual meteorological conditions(VMC). This sim-pleexampleillustratesthatwhathashappenedintherecentpast temporally,mattersforoperatorworkloadassessment.

Machine learningalgorithmsthat considerpastinformation as well ascurrent information when ﬁtting models should perform better.Suchalgorithmsmustbeabletolearnatemporal represen-tationofthe data.Acommonmodelused formodelingtemporal dataistheRecurrentNeuralNetwork(RNN).RNNsareneural net-worksthat are ableto learn sequencesthat are not composedof independent,identicallydistributedobservations[16].Rather,they areabletoelicitthecontextofobservationswithinsequencesand accurately classify sequences that have strong temporal correla-tions[16].Historically,RNNshadlimitationswhentrainingmodels withmorethan10–20timesteps whichledtopoorperformance. Incorporatinglongertime-seriesdatastreamswouldcause compu-tationalsensitivityproblemsthatstymiedRNNtraining.

Recent developments have resulted in RNN architectural and training advances which mitigate these computational problems andallowmuch longer temporalsequences to beprocessed. One approachis theLong Short-Term Memory(LSTM) layer. LSTM ar-chitecturesextendthelengthofsequencesthatcanbeconsidered by aRNN by overcomingcomputational sensitivities encountered duringbackpropagation[21].Forthesereasons,theymayoffer im-provedworkloadclassiﬁcationaccuracyoverothermethods when using EEG data. With these improvements in machine learning, thereis no longer areason toavoid incorporatingtemporal con-textinaworkloadmodel.Wecapitalizeonthesemachinelearning developmentsinourresearch.

The primary contribution ofthis research is demonstration of significantly improved cross-day workload classification accuracy byintegratingcontextuallyrelevantalgorithmicarchitectureswith improved feature generation techniques. We statistically evalu-ateallcombinations ofmean,variance,skewness, andkurtosis of frequency-domain power distributions and contrast a variety of RNN architectures, to include deeply stacked LSTMs, with base-line algorithms and features. Both linear and Radial Basis Func-tion(RBF)SupportVectorMachines(SVMs)andsingle-layer feed-forwardArtificialNeuralNetwork(ANNs)usingmean-onlyfeatures areused asbaselinecases.We show thatby accountingfor tem-poraldependenceusingdeepLSTM modelstrainedwithnew fea-turecombinations, we can maximize cross-day workload estima-tion accuracy resulting in a 58% reduction in classification error over baseline methods anda 59% decreasein error compared to thebestpublishedresultsforthisdataset.

2. Backgroundandrelatedwork

Temporalnon-stationarity of electroencephalograph (EEG) sig-nals within individuals islikely caused by a large numberof in-trinsicandextrinsicfactors.Participant motivationandmentalor physicalreadinessareexamplesofsomeintrinsicfactors;extrinsic factorsincludesigniﬁcantdifferencesinEEGelectrode placement, changes in conductance, and different motion artifacts [8,23,30]. Due to the challenge of handling these factors, cross-day non-stationarityofEEGsignalshasmotivatedanumberofrelated stud-iesincludingseveralusingthesamedatasetdescribedbelow.

2.1. Dataset

Dataforourinvestigationwasusedinthe2011CognitiveState AssessmentCompetition[13]andwasrecordedduringaprior hu-manresearchstudyperformedbyWilsonetal.[43].Eight partici-pantscompletedscenarioswithintheMulti-AttributeTaskBattery (MATB) [10]environment across five test days spreadout over a month-longperiod.Monitoring,communication,resource manage-ment,and trackingtasks were presented andmanipulated to in-duce threelevels ofdifficulty: low, medium,and high[8,43]. Re-sourceallocationerrors,monitoringtaskreactiontimes,and com-munication response times were recorded and used to validate that participantsexperienced distinct low andhighdifficulty lev-els.Participantsweretrainedtoasymptoticproficiencypriortothe firsttestday[43].

Foreachparticipant,horizontalelectrooculogram(HEOG), verti-calEOG(VEOG),and19channelsofEEGvoltages(accordingtothe International10–20System)were sampledat256Hz.Oneach of thefivedays,eachparticipantperformedthreefive-minutetrialsat low,medium, andhighdifficultyforatotalofninetrialsper day. Trialswerepresentedinarandomorderingwithtransitionperiods inbetween.Eachparticipantcompleteda 30srestingbaseline at thestart of each sessionprior to theMATB task.Onlysixof the participantswereusedinourstudyduetomissingdatafromtwo oftheoriginal eightparticipants[19].SimilartoChristensenetal.

(4)

[8]andCasson[5],weselecteddatafromthelow andhigh work-loadconditionsresultinginatotalof30conditions–15lowand15 high–foreachindividual.

2.2.Within-participantcross-dayvariability

Many researchers using this dataset focused their efforts on cross-participantvariability ora combinationof cross-participant and cross-day models rather than independently investigating cross-dayvariabilitywithinindividuals.Thissectionwillonly con-sider those works that exclusively examined within-participant cross-dayvariability.

In the workload literature using this dataset, three research teams focused solely on the performance of classifiers in cross-day analyses. Hefron and Borghetti evaluated the variance of frequency-domain power distributions as a feature and found it toimproveaccuracy forrandomforest, Linear Discriminant Anal-ysis(LDA),andK-NearestNeighbors(KNN)classifiersforthe two-class–highversuslow workload–problem. Modelsbuilt with vari-anceandmeanfeaturesexhibiteda5.8%increaseinaccuracyover modelsbuiltusingonlythemeanoffrequency-domainpower dis-tributions [19]. In their study, it was postulated that the use of skewnessandkurtosisasfeatures couldgeneratefurther gainsin classificationaccuracy[19].

In another study, Christensen et al. evaluated cross-day clas-sification performance of low versus high workload by training threeclassifiers:LDA,SupportVectorMachine(SVM),anda single-hidden-layerfeedforwardArtificialNeuralNetwork(ANN)[8].Two SVM implementations were used–one used a radial basis func-tionkernel and theother a linear kernel.The authors noted the linearkernel performed best andwas used forreporting results. Oneportionoftheexperimental designusedleave-one-out cross-validation, holding out one day’s data in each case. This proce-dure produced five separate testing periods that were used to evaluatealgorithmicperformance. Ofnote, theSVMachieved68% accuracy, while the ANN attained 83% accuracy [8]. The results demonstrated significantly better cross-day performance for the neural network compared to more traditional machine learning techniques, namely LDA and SVM classifiers. These results indi-catedthatneuralnetworksmayhavemorecapacitytohandle non-stationarityoffeature-to-targetmappings.

In a studyvery similar to Christensen’s, Casson [5] evaluated the effect of temporal stability of feedforward ANNs trained on EEGsignalswithdifferencesofseconds,minutes,hours,anddays betweencollectionoftrainingdata andtestingdata.While other interestingresultswerepresentedinthepaper,themostrelevant toourwork concerned participant-speciﬁc,leave-one-session-out, cross-validatedmodels.Thesemodelswereaimedatproducing ex-cellent cross-session predictive performance andattained an av-erageclassiﬁcation accuracy of 73%.Differences in preprocessing, training, and network architecture contributed to the differential betweentheseresultsandours.

In all of these investigations, the demonstrated classiﬁcation accuracy fell short of recommendations for use in many opera-tionalsettings.While accuracyrequirementswill betaskspeciﬁc, as discussed in Section 1, neither Rouse’s desired 95% accuracy rate [35] nor Parasuraman’s near 100% accuracy [31] condition weremetforworkloadestimation.Since allofthesemethods as-sumedtemporalindependencebetweentimesegments,webelieve newmodelsthataccountfortemporospatialdependenciesandthe useofnewfeaturesshouldbeexplored.

2.3.RNNsandLSTMs

Deep neural networks have been used extensively to achieve state-of-the-artresultsacrossawiderangeofcategoriesincluding

Fig.1. Temporalunfoldingoftherecurrentunitinthecomputationalgraph.Onthe leftsideisthecyclicgraph.Allcycleshavebeenremovedfromthegraphonthe rightsidebyunfoldingthecyclicgraphintime.NotetheweightmatricesU,V,and Waresharedacrosstime.

imagerecognition,speechrecognition,translation,andimage cap-tioning[16,17,26,40].RecurrentNeuralNetworks(RNNs)areatype of deep network where depth is added via a recurrent connec-tioninthehiddenlayerwhichisusedtoprocesssequentialdata. Processingsequentialdataisfundamentallydifferentthan process-ingindependentidenticallydistributedobservationsbecauseofthe distributional dependenceon thesequence. In a traditional feed-forward ANN, each observation isprocessed individually and the networkstatedoesnotpersistwhileotherpotentially sequentially-dependent points are processed. Conversely, in a RNN, recurrent connectionspassstateinformationacrosstimestepsallowing pre-viously processed observations to affectthe subsequent observa-tions[29].

Feedforward ANNs have a directed acyclic computational graph–onelayerfeedstoanotherwithnocycles.RNNscanbe un-derstoodasanextensiontothefeedforwardstructureallowingfor cycles in the graph structure. Fig. 1 showshow very deep com-putationalgraphs canbe createdwhenthe recurrentconnections are unfolded in time. The process of unfolding a recurrent net-work simply requires removal of all cycles in the graph to form a directed acyclic graph [15]. To allow these networks to learn important pieces ofinformation that may be located atdifferent positions in thesequential data, theinput, recurrent, and output weight matrices’ parameters are shared[15]. Ifdifferent parame-ters were used foreach time step,no generalization across time wouldbepossible;sharingparametersallowsthenetworktokeep trackofstate.Sincetemporalnon-stationarityofEEGsignalsacross daysisa challenge,we neededto ensurethetemporal sequences suppliedtoourmodeldidnot exceedperiods oftime overwhich feature-to-targetnon-stationaritywouldoccur.

Inadditiontothe depthofaRNNduetounfolding overtime, depth can be added to the network when recurrent layers are stacked in a sequence-to-sequence fashion. This means that the output from one layer of a RNN returns a sequence of vectors whichformtheinputtothenextlayer.Gravesetal.demonstrated thatdeeplylayeringRNNshasamorebeneﬁcialeffectthanmerely adding memory cells [17]. The lower layers transform input se-quencesintomoreeasilylearnedrepresentationsinthehigher lay-ers,resultinginbetterclassiﬁcationaccuracy[15].

One problem with the simple recurrent structure shown in

Fig.1 isa resultofshared weights whichproduce vanishingand exploding gradients during backpropagation through time. This limitsthesequentialdepthofsimplerecurrentunitstosequences ofnogreaterthan10–20observationsbecausethesignalfrom dis-tantpositionswillnotpropagatetothecurrenttime[15].A signif-icant innovationwhich solved thisproblemwasLong Short-Term Memory (LSTM) [14,21]. LSTM provides the algorithm ﬁne con-trol over what is put into memory and removed from memory in the hidden layer. This is achieved by a combination of three gates whichcontrolﬂow intoandout ofthememorycell: an

(5)

in-Fig.2. ThisillustratestheinternalstructureofasingleLSTMmemorycellunfoldedintimewherextisthewindowedsequenceinputfromtimet,andhtisthehidden-vector

stateattimet.Weightmatricesarelabeledusingafrom-tosubscriptconvention,Wfrom,to.Biases,b,havesubscriptsassociatedwiththeircorrespondinggateoractivation.

Thefollowingoperationsareannotatedsymbolically:Xrepresentsmatrix-vectormultiplication,◦isaHadamard(element-wise)productbetweentwovectors,isused for summationsof3ormorevectors,+indicates theaddition of twovectors,finallyσ andtanharerespectivelyasigmoidfunctionandhyperbolictangentfunctionapplied element-wisetoavector.Thememorycellstatevectorpropagatesalongthedashedbluelineatthetopwhiletheinputvectorandhiddenvectorbusesareconnectedas shownbythegoldandblacklinesrespectively.Allthreeofthesevectorsshouldbethoughtofascolumnvectors.Thesizeofthehiddenvectorandmemorycellstatevector areequaltothenumberof“memorycells” specifiedduringinitializationoftheLSTMlayer.Threesigmoidfunctionsactasgatesandarelabeled:forget,input,andoutput. IfaLSTMreturnsasequence,thesequenceisaseriesofthehiddenvectorstateswitheachcorrespondingtotheoutputfromaparticulartimestepwithinthesupplied temporalwindow.Ifasequenceisnotreturned,typicallythelasthiddenstateisreturned.(Forinterpretationofthereferencestocolorinthisfigure,thereaderisreferred tothewebversionofthisarticle.)

put gate, a forget gate, andan output gate. Each gate works by usinganelement-wisesigmoidfunction,

σ

,toscaleeachelement of the gate vector to a value between0 and1. The gating func-tionality is accomplishedby takingthisvector ofvalues between 0 and1 andperformingan element-wisemultiplication with an-othervector,thusspecifyingwhatproportionofthesecondvector passesthroughthegate; andconversely,determiningwhichparts are blocked. Fig. 2 illustrates the internal structure of the LSTM. Forclarity, we willdescribe thestate of amemorycell attime t

withrecurrentconnectionsfromtimet−1,andtotimet+1. AsshowninFig.2,therearetwovectorsthatpersistfromtime

t−1:thehiddenvector,ht−1,andthememorycellstate,st−1.The

forgetgateft,asshowninEq.(1),determineswhattoremovefrom

thememorycellstate.Inotherwords,itforcesthememorycellto forget things thatare not importantbasedonerror backpropaga-tion:

ft=

σ

(

Wx fxt+Wh fht−1+bf

)

, (1)

wheretheweightmatrixWxfistheweightmatrixfromtheinputx

totheforgetgateft,xtistheinputattimet,Whfistheweight

ma-trixfromtheprevioushiddenvectorht−1totheforgetgateft,and bfistheforgetgatebias.Thefrom-tosubscriptconventionisused

todescribe eachweightmatrix.Theinputgateit determineshow

much each elementofthe candidateupdate vector, s˜t,should be

addedtothe correspondingmemorycellelement attime tbased on therecurrentconnectionfromthehiddenvector ht−1 andthe

sequentialinputattimet,xt.Thegatescalesthecandidateupdate

vectorwhichistheoutputfromafully-connectedtanhlayer:

it=

σ

(

Wxixt+Whiht−1+bi

)

(2)

˜

st=tanh

(

Wxsxt+Whsht−1+bs

)

. (3)

Thedelicate balancerequiredtomaintainmemorycell state over long sequences is attained by forgetting old information and in-corporatingnewinformation.Forgettingisaccomplishedby multi-plyingtheoldmemorycellstate bytheoutputoftheforget gate, while the new information is suppliedby adding the portion of eachvaluespeciﬁedbytheelement-wiseproductoftheinputgate withthecandidateupdate:

st=it∗s˜t+ft∗st−1. (4)

Finally,theoutputgate determineswhat shouldbe outputto the hiddenvectorfromthememorycellstate,giventhetemporal con-text ofthetime-step, to minimizeerror. Theoutput gate, ot,and

hiddenvector,ht,aregivenby:

ot=

σ

(

Wxoxt+Whoht−1+bo

)

(5)

ht=ot∗tanh

(

st

)

. (6)

2.4.RNNmodelsforEEGanalysis

Despite excellent results in other ﬁelds, relatively few re-searchershaveapplieddeepneuralnetworktechniquestoclassify EEGdata.Bashivanetal.[2]trainedadeeprecurrent-convolutional neural network accounting forboth temporal andspatial depen-denciesinthe network.Theybegan by performinga FastFourier Transform (FFT) on each time-series signal from each electrode andestimatingthepowerinthetheta (4–7Hz), alpha(8–13 Hz), and beta (13–30 Hz) frequency bands over 3.5 s working mem-oryexperimenttrialswithfourclassesoftaskdiﬃculty.Then,they created a time-series of images by performing a 2-d Azimuthal Equidistant Projection (AEP) for power in each of the three dif-ferentbands.Thispreserveddistancesfromeach electrodeto the

(6)

centerpoint ofthe EEGcap,thus accountingforsome spatial de-pendencies.These imageswere fed into aseries ofconvolutional andmaxpoolinglayers.Theirbestperformingmodelfedthe out-putfromtheconvolutional portion ofthearchitecture into a 1-d convolutionallayer,aswellasaLSTM layer,andmergedthe out-putfromboth oftheseintoafullyconnectedlayer.Thiswasthen connectedtoasoftmaxlayerwhichprovidedthefinalclassification result.Thiscomplexarchitectureused1.62millionparametersand wasableto reduce the classification errorfrom 15.34%to 8.89%; an impressive 42% reduction compared to a baseline radial basis function SVM [2]. While a large reduction inerror over baseline methods wasachieved, the complexity andamount of computa-tionrequiredto train such adeep networkis significant.Our re-searchsoughttoexamineifsimilarreductionsinerrorover base-linemethodscouldbeachievedusingdifferentpreprocessing tech-niquesanda lesscomplex deeparchitecture whichonlyused re-currentneuralnetworks.

TwootherresearchersusedaformoftheLSTMtoanalyzeEEG data. Davidson, et al. found that a small single-layer LSTM was able to identify lapses in attention based on EEG spectral data betterthan a tapped delay-line Multilayer Perceptron (MLP) dur-ingperformance ofa visuomotortracking task[12]. Theytrained theirnetworksina leave-one-out,cross-subject manner.Their re-sultsevaluatedtemporaldependenciesout toa lengthof6s and showedthataccountingfortemporaldependenceofupto4sprior toalapseimproveddetection.Weuseanextendedtemporal win-dow of 30 s in this study. In other work, Binz et al.[3] used a derivativeoftheLSTMunit,calledDynamicCortexMemory(DCM), toclassifyimaginedsensorimotorimagerydatafromaBrain Com-puter Interface (BCI) workshop competition [4]. A single hidden layerwasusedwitheightDCM unitsfollowedby asoftmax out-putlayer.Whiletheirresultsdidnotachievestate-of-theart accu-racy,severaladvantageswere presentintheir solution.Their net-workwasableto providereal-timeresults andtheirsolution did notneedtospecifyatimewindow,sincethenetworklearned ap-propriatetemporally-dependentsequences [3].Binz’sworklargely differedfromourssinceamixtureofwithin-participantand cross-participant models were used to evaluate trials that were sepa-rated from the training data by only seconds-to-minutes rather than days. The only similarity between our studies was that we bothusedformsoftheLSTMarchitecturetoaccountfortemporal dependenciesinEEGsignals.

In a medical application, several research groups

[18,27,36,39] successively improved performance of epilepsy diagnosisusingRNNs. Allfourgroupsusedvarious inputfeatures to train small Elman RNNs in a cross-subject manner using the datasetdescribed by Andrzejak et al. [1]. The Elman RNN archi-tecture has limited temporal representational capacity compared tothenetworkstrainedin ourinvestigation.Like ourstudy,each research group demonstrated the superiority of an RNN com-paredtoother neuralnetworksthatdidnot incorporatetemporal dynamics. However, unlike our experiment, the data sequences analyzeddidnot spantemporallengthsgreat enoughtoexamine day-to-dayvariability.Finally,GulerandUbeylibothdemonstrated that summary statistics (min, max, mean, standard deviation) of time-varyingfeaturedistributions canbeusefulinputfeatures for arecurrentmodel; however,no comparisonto abaseline feature setwasprovidedineithercase.

The primary difference betweenprevious research andoursis thatwedemonstrateimprovedwithin-participant,day-to-day fea-turestationaritybyaccountingfortemporaldependenciesin cog-nitiveactivity;whereastheaforementionedstudiesdonotaddress day-to-dayvariability.Anotherdifferenceisthatthepreceding re-searchfocusedoneithermedicalusesorstateestimationunrelated toworkload.Aﬁnal differentiatingfactorwasthatmanyprevious studieswereconductedpriortobreakthroughsthatenableddrastic

Table1

Testmatrixofallcombinationsofmean,variance, skewness,andkurtosisfeatures.∗ _Denotes_feature

setsthatwereincludedinmodelsthatincorporated themeanwhileallothersaremodelsthatdidnot incorporatethemean.

Testrun Featuresincludedindataset

1∗ _Mean 2 Variance 3 Skewness 4 Kurtosis 5∗ _Mean,_Variance 6∗ Mean,Skewness 7∗ _Mean,_Kurtosis 8 Variance,Skewness 9 Variance,Kurtosis 10 Skewness,Kurtosis 11∗ _Mean,_Variance,_Skewness

12∗ Mean,Variance,Kurtosis 13∗ _Mean,_Skewness,_Kurtosis

14 Variance,Skewness,Kurtosis 15∗ _Mean,_Variance,_Skewness,_Kurtosis

improvementsinnetworkperformancesuchasadvancesin initial-ization,optimization,andregularizationtechniques,aswell asthe rise of abundant computational capacity via Graphics Processing Unit(GPU)computingwhichenablesthetrainingoflarger,deeper networks.

3. Methodology

Thegoalofproducingseveralworkloadmodelstoevaluatethe eﬃcacyofnewfeatures andLSTM basedclassiﬁcationalgorithms requiredpreprocessingofthe EEGdatatoconvertit intothe fre-quencydomainandtoextractpowerindifferentfrequencybands asoutlined by [19].RawEEG datawastransformed into features inclinical frequencybands(delta (1–4Hz), theta (4–8Hz), alpha (8–14Hz),beta(15–30),andgamma(30–55Hz) toconduct time-frequencyanalysisusingthefollowingprocess:Thepowerspectral densitywasdeterminedfor30 pointsspread outover alogspace from 3 Hz to 55 Hz by extracting power from complex Morlet wavelets [9]. Eachwavelet was2 s in length andthe numberof waveletcyclesincreasedlogarithmically from3to 10in conjunc-tion with the frequencies. Mean power in each band was deter-mined by averaging each powervalue forthe evaluated frequen-cieswithineachoftheclinicalbands.Powerwasthenaggregated overatensecondslidingwindowwith9sofoverlap,allowingfor anewupdateeachsecond.

Finalfeatures were generated by determining the mean, vari-ance, skewness,andkurtosisof thepowerdistributionin eachof thesetensecondwindowsforall19EEGelectrodesitesacrossthe five frequency bands. This process yielded 380 features for each secondandapproximately9000observationsperindividualforthe fivedayperiod.Thesefeatures were then centeredandscaled by session sothat each session hada meanof zeroandvariance of one sincenoneofthealgorithmsused foranalysiswere scale in-variant.Thedataweresplitsuchthatthefirstfourdayswereused fortraining andcross-validation while thelast daywas reserved fortesting.

Toexaminetheeffectofinclusion/exclusionofparticulartypes offeatures, the test matrix shownin Table1 wasconstructed to document features includedin each test run. A totalof six algo-rithmswere usedtotrainmodels: linearSVM(SVM-L),Radial Ba-sis Function(RBF)SVM (SVM-R), feedforwardANN(ANN), deeply stacked simple RNN (RNN-D), singleLSTM (LSTM-S), and deeply stackedLSTM(LSTM-D).Allmodeldevelopmentused4-fold cross-validationtoselecthyperparametersforeachmodelandthenﬁnal modelsweretrainedusingalldatafromdays1–4.Thisprocesswas

(7)

repeatedforeachfeaturesetinTable1resultingin90final mod-elsforeachalgorithm.Finalmodelsweretrainedusingeachsetof featuresanddifferencesinclassificationaccuracyduetochoiceof algorithmandfeaturesetwereconsidered.Itisimportanttonote thatthecross-validationdatawassplitsothateachfoldwasafull dayratherthansplittingthefoldsbyrandomselectionof observa-tions fromtheentiredataset.There weretwo compellingreasons forthischoice.Thefirstreasonwasthatrandomlyselected cross-validationpointsallowfortoomanytemporallyadjacentpointsto be split between the training and test sets which artificially in-flatescross-validationaccuracy duetothe non-stationarityofthe datasets. The second reasonwas to preserve temporalcontext of thedataforprocessingwhenusingRNNs.

Once final models were produced, classification accuraciesfor the holdout day 5 test set were determined for each individual andthealgorithmaveragewascalculatedacrossparticipants.This enabledby-algorithmandby-featuresetcomparisons.Dueto con-founding effects of algorithm selection and feature sets on clas-sification accuracy, an ANOVAtest withfivefactor outcomeswas performedtoelicit theeffectofvarying levelsofeach factor.The first factor was algorithm selection which had six levels corre-sponding to each ofthe aforementioned algorithms. The remain-ingfourfactorsgroupedthefeaturesetsinabinaryfashionbased on whether a feature wasincluded or excluded froma particu-lartest run. ForexampleinTable 1,all asteriskedrunsproduced modelsthatincludedthemean,whiletheothersweremodelsthat excluded meanfeatures. Interactioneffects werenot examined.A significancelevelof

α

=0.05wasusedtoindicateifafactorhada statisticallysignificantimpactonclassificationaccuracy.TheTukey Honest SignificantDifference(HSD)test wasperformedfollowing theANOVAtodetermine whichclassificationaccuracieswere dif-ferentfromeachotherinthesix-levelalgorithmfactor,andto de-terminethedirectionandmagnitudeofdifferencesacrossall two-levelfactorcomparisons.

All neural networks were created using the Keras [6] and Theano[38]frameworks.ThefeedforwardANNhadasinglehidden layerthatwasfullyconnectedwiththeinputlayerandtheoutput layer.Theinputconsistedoftheappropriatenumberoffeaturesfor a giventest run,whiletheoutputlayerwasasinglenodewitha sigmoidactivation functionwhich forcedaclassiﬁcation aseither highorlowworkload.Mini-batchgradientdescentwasperformed using 600 observations per batch for all neural network imple-mentations,andtheAdamoptimizerwaschosentooptimize mini-batch gradientdescent duetoits abilitytohandlenon-stationary targets andnoisy data [25]. A binarycross-entropy loss function was selected as the cost function [15]. The number of nodes in the hiddenlayerwastunedby performing 4-foldcross-validation whilevaryingthenumberofnodesfrom50to800instepsof50. This resulted in 3960 cross-validation models being trained. The lowestcross-validationerrorratewasusedtodetermineboththe numberofnodestouseinthehiddenlayerandhowmanyepochs totrainthenetwork.

AsillustratedinFig.3,thedeepLSTMarchitectureconsistedof aninputlayer,thefirstsequence-to-sequenceLSTMlayer,a many-to-one LSTM layer, a 20% dropout layer, and a final sigmoid ac-tivation function for binary classification. The first hidden layer contained 50LSTM unitswhile the second hiddenlayer used 10 units. Withunlimited resources, we wouldhave tuned the num-berofhiddenlayernodesexhaustivelyusinggridcross-validation. However, due to computational resource constraints, several hid-denlayersizesweretestedforeachlayeronsmallerrepresentative sets ofdataandit wasfound thatreducing the numberofLSTM units in each layer improved generalization and that empirically, 50and10appearedtoworkwell.Eachnetworkhadalookbackof 30sofpre-processedfeaturesor,forthecaseofthesecondlayer, featuresgeneratedfromtheoutputofthefirstLSTMlayer.Dropout

Fig.3. DeepLSTMArchitecture:Thisillustratesthesizeofthetensorintermsof batchsize,temporaldepthinseconds,andnumberoffeaturesusedateachlevel ofthenetwork.Prepresentsthenumberoffeaturesusedbasedonthetestmatrix showninTable1,andrangedfrom90to380features.SinceLSTM2doesnotreturn asequence,thetensorbecomestwo-dimensionalatthatpoint.

ontheinputgatestoeachLSTMlayerandbetweenthefinalLSTM andfully-connectedsigmoidlayer servedasa methodof regular-izationandwassetto20%[15,37].Dropoutpreventsco-adaptation ofthehiddenunitsbytemporarilyremoving apercentageof ran-domlyselectednodes,includingtheirinputandoutput,inagiven layerduringatrainingpass[20].Thisforceshiddenunitstolearn featureswithoutdependinguponparticularnodestocorrect mis-takesmadeduringlearning[20].Whiledropoutisthemostwidely usedregularizationmethodfordeepneuralarchitectures,itisalso importanttounderstandwhenitisnotappropriatetouse.The re-currentconnections withintheLSTM structureareone suchcase. ThepurposeoftherecurrentconnectioninaLSTMistostore im-portant long-termdependencies.Pham et al.[33] showed that if dropoutisappliedtotherecurrentconnection,thenthelongterm memorybecomescorrupted andinhibitslearningratherthan im-proving generalization. Similar to Zaremba et al. [46], we found using a dropout of 20% on the input gates and between the fi-nalLSTMoutputandthesigmoidclassificationlayerprovided bet-ter resultsthanthe typicallyrecommended 50%dropout forfully connectedlayers.Thenumberoftrainingepochswastuned using cross-validationaspreviously specifiedin thissection. Thelength of training with the lowest cross-validation error rate for each participant/featuresetcombinationwasselectedforfinalnetwork training.Alltrainingdatawasthenusedtoretrainthenetwork.To ensurethat anomalous behaviorwas not presentinthe reported resultsfromday5,allother combinationsofcross-validationwith hold-out test day were performed for the deeply stacked LSTM model.Nosignificantdeviationsfromthereporteduponresultsin

Section 4 were observed due to permuting the test andtraining days.

Tworecurrent networks were trained which were derivatives ofthedeepLSTMarchitecture.Thedifferencesbetweenthe archi-tectures are detailed next. The deeply stacked simple RNN used thesamearchitectureasthedeepLSTMexceptthatinsteadof us-ingLSTM units,simplerecurrentunitswere used.Dueto param-eter sharing in the weight matrix for the recurrent connections andalackofgatingfunctions,thevanishinggradientproblemwas presentandrestrictedtheeffectivetemporalperiodto10–20sfor thismodeldespite30sofdatabeingprovided asinput [15].The second networkconsisted ofan input layer,a many-to-oneLSTM layerusing50 hiddenunits,andasigmoid output layer.Training andclassiﬁcationmethodsusingthesenetworksmirroredthoseof thedeepLSTMarchitecture.

TwocategoriesofSVMswere trained,onewithalinearkernel asrecommendedbyChristensenetal.[8],andoneusingaRBF ker-nel. The resulting models were used to compare neural network results to those from a more traditional machine learning algo-rithm.TotrainthelinearSVM,theappropriatenumberoffeatures

(8)

Table2

ANOVAtablewithﬁvefactors.Algorithmisasix-levelfactorindicatingthealgorithmused: LSTM-D,LSTM-S,RNN-D,ANN,SVM-L,orSVM-R.Theﬁnalfourfactorshavetwolevelsand indicatethefeaturewaseitherincludedorexcluded.

Source DF Sumofsquares R2 _F_Ratio _Prob_>_F

Algorithm 5 0.7876 0.2774 29.4052 <0.0001 Categorizedbymean 1 1.2113 0.4266 226.1277 <0.0001 Categorizedbyvariance 1 0.2579 0.0908 48.1448 <0.0001 Categorizedbykurtosis 1 0.0033 0.0011 0.6235 0.4301 Categorizedbyskewness 1 0.0028 0.0009 0.5252 0.4689 Table3

TukeyHSDresultsforallpairsofalgorithmlevels.

Model1 Model2 Diff StdErr tRatio Prob>|t| 95%ConfInt LSTM-D ANN 0.089 0.011 8.19 <0.0001 0.058to0.121 LSTM-D SVM-L 0.088 0.011 8.06 <0.0001 0.057to0.119 LSTM-D SVM-R 0.078 0.011 7.12 <0.0001 0.046to0.109 LSTM-D RNN-D 0.020 0.011 1.82 0.4536 −0.011to0.051 LSTM-D LSTM-S 0.010 0.011 0.89 0.9493 −0.022to0.041 LSTM-S ANN 0.080 0.011 7.31 <0.0001 0.049to0.111 LSTM-S SVM-L 0.078 0.011 7.17 <0.0001 0.047to0.109 LSTM-S SVM-R 0.068 0.011 6.23 <0.0001 0.037to0.099 LSTM-S RNN-D 0.010 0.011 0.93 0.9382 −0.021to0.041 RNN-D ANN 0.070 0.011 6.37 <0.0001 0.038to0.101 RNN-D SVM-L 0.068 0.011 6.24 <0.0001 0.037to0.099 RNN-D SVM-R 0.058 0.011 5.30 <0.0001 0.027to0.089 SVM-R ANN 0.012 0.011 1.08 0.8906 −0.019to0.043 SVM-R SVM-L 0.010 0.011 0.94 0.9351 −0.021to0.041 SVM-L ANN 0.001 0.011 0.13 1 −0.03to0.033

foragiventestrunwereprovidedforeachobservationsuppliedto theSVM.Thetuning parameterCseta tolerancefornumberand severityofmarginandhyperplaneviolations–effectively determin-ingsmoothnessof thedecisionsurface[24,32].This hyperparam-eterwasoptimizedusinga cross-validatedgridsearch across ex-ponentiallyspacedvaluesofCfrom10−4_to₁₀2 _resulting_in₂₅₂₀

cross-validationmodelsforthelinearSVM.Nearlyall Cvalues se-lectedforthe ﬁnal models wereeither 10−4 _or₁₀−3_. _All_training

data wasthen used to retrain the linear SVM with the selected valuesforC.The procedure fortrainingthe RBFSVM wasnearly thesameasinthelinearcase,exceptacross-validatedgridsearch overvaluesof

γ

andCwasperformedwherethehyperparameter

γ

controlled the radius ofinﬂuencefora singleobservation.The samerangeofCvalueswereevaluatedwhile

γ

wasexponentially variedfrom10−3_to₁₀2_,_resulting_in_15,120_{cross-validation}_models

fortheRBFSVM.Thebesthyperparameterswereselectedandthe ﬁnal models were trained.This procedureensured proper tuning toestablishavalidbaselineforcomparisonwithnewtechniques.

4. Results

Results of the five-factorANOVAindicate that algorithm type, meanfeatures, andvariance featureshadastatisticallysignificant effectonclassificationaccuracy (allp-values_<0_.0001).Skewness andkurtosisinthe presenceof meanandvariance werenot sig-nificant (p-values > 0.43). Table 2 summarizes the results from theANOVAwhile Tables3and5displaytheposthocTukeyHSD results. Results from the Tukey HSD test showed that the ANN

Table4

Cross-participantaveraged classiﬁcationaccuracy foreachmodelandfeature set.Mean,variance,skewness,and kurtosisfeaturesaredenotedbyM,V,S, andKrespectively.Boldvaluesaremodelswithgreaterthan90%classiﬁcation accruacy.

Featureset SVM-L SVM-R ANN LSTM-S RNN-D LSTM-D M 0.823 0.834 0.816 0.871 0.884 0.891 V 0.762 0.769 0.754 0.865 0.850 0.861 S 0.680 0.694 0.678 0.757 0.760 0.765 K 0.672 0.686 0.662 0.732 0.736 0.762 M/V 0.836 0.846 0.844 0.911 0.866 0.911 M/S 0.823 0.836 0.828 0.878 0.861 0.897 M/K 0.831 0.843 0.830 0.896 0.884 0.908 V/S 0.762 0.771 0.748 0.847 0.869 0.862 V/K 0.758 0.769 0.757 0.863 0.833 0.882 S/K 0.716 0.733 0.709 0.834 0.770 0.807 M/V/S 0.839 0.840 0.848 0.909 0.890 0.927 M/V/K 0.835 0.837 0.838 0.907 0.918 0.930 M/S/K 0.824 0.838 0.831 0.897 0.911 0.918 V/S/K 0.759 0.771 0.753 0.846 0.847 0.863 M/V/S/K 0.835 0.842 0.836 0.913 0.899 0.930

andbothSVMmodels’meanclassificationaccuracieswerenot sta-tistically different with p-values ranging from 0.8906 to 1. The deep LSTM models demonstratedstatistically significant accuracy increasesof 8.9%over ANNresultsaswell as8.8%and7.8% over linear and RBF SVM results respectively (all p-values <0.0001). The participant averaged classification accuracies for each model and feature set inTable 4 show that there are 6 feature combi-Table5

TukeyHSDresultscomparingmodelswherethefeatureofinterestwasincludedversus thosewhereitwasexcluded.

Model1 Model2 Diff StdErr tRatio Prob>|t| 95%ConfInt Mean -Mean 0.096 0.006 15.04 <0.0001 0.083to0.108 Var -Var 0.044 0.006 6.94 <0.0001 0.032to0.057 Skew -Skew 0.005 0.006 0.72 0.4689 −0.008to0.017 Kurt -Kurt 0.005 0.006 0.79 0.4301 −0.007to0.018

(9)

nationsthatresultinclassificationaccuraciesgreaterthan90% us-ing the deep LSTM architecture.The highest accuracyfeature set was attained usingall features with the deep LSTM model.This modelachievedan average classificationaccuracy of93.0%across participants. This compares favorably to 84.8% and 84.6% forthe bestANNandSVMcross-participantfeaturesetperformance. Fur-thermore,thisrepresentsa58%decreaseinerrorcomparedtothe best baseline case–the mean-only RBF SVM. These results illus-trate the importance ofaccounting fortemporal dependenciesin workloaddata.Furtherexaminingtheresultsshowsthatthe influ-enceofusingatemporally-statefulmodelonclassificationaccuracy forworkloadestimationcannotbeunderstated.Inallcasestested, every temporally-stateful model outperformed the best perform-ingnon-statefulmodelandwerefoundtobesignificantlydifferent thanallnon-statefulmodelswithTukeyHSDp-valuesall<0.0001 (Table3).

Statistically, inclusion of mean and variance were signiﬁcant, whileskewnessandkurtosiswerenot.ANOVAresultsshowa sig-niﬁcanteffectdependentuponinclusionorexclusionofmean fea-tures

(

p_<0_.0001

)

.PosthoccomparisonsusingtheTukeyHSDtest indicate that average classification accuracy formodels including the mean features results in an accuracy increase of 9.6% versus models excludingthemeanfeatures (Table5). Asignificanteffect isalsopresentdependentupon inclusionorexclusionofvariance features(p<.0001).TheTukeyHSDtestshowsthataverage classi-ficationaccuracyforincludingthevariancefeaturesincreased4.4% versusexcludingthevariancefeatures.ANOVAandTukeyHSD re-sults for skewness

(

p=0.4689

)

and kurtosis

(

p=0.4301

)

were not significant. However, by comparingmodels withandwithout a singlefeature, these results merely indicate that kurtosis does notaddsignificantlytoamodelwithskewnessincludedand vice-versa. We hypothesize that skewness and kurtosis may be more importantinsituationsinvolvingworkloadtransitions.Wedidnot investigate workload transitions in our research, but expect that transitionsbetweendifferentworkloadlevelsmayfirstbecome ap-parentinthetailsofthedistributions.

5. Conclusionandfuturework

Cross-day workloadestimationbased onEEG isa diﬃcult do-main due to temporal non-stationarity of feature-to-target map-pings.Previousresearchoncross-dayworkloadestimation implic-itly assumed independence ofa participant’s workload from one instancetothenext duetothealgorithms usedforanalysis. The-ory and practical experience show that workload can build in a cumulative fashionandthat atemporaldependenceexists. Recur-rent Neural Network (RNN) models, in particular those that use Long Short-Term Memory (LSTM) architectures, can account for bothlong-termandshort-termtemporaldependenciesinherentin brainactivity data.This workalso statisticallyevaluated the util-ityofmean,variance,skewness,andkurtosisoffrequency-domain power distributions andfoundonly themeanand varianceto be statisticallysigniﬁcant.

This research demonstrated the utility of deep RNN mod-els and particular feature sets for cross-day workload estima-tionandshowedthat drasticallyimprovedmodelaccuracycanbe achievedoverSupportVectorMachine(SVM)andfeedforward Ar-tiﬁcial Neural Network (ANN) models when working with non-independentdata.Previously,thebestaccuracyachievedusingthis datasetwas83%.OurmodelsbuiltwithdeepLSTMsincreasedthat accuracyto93.0%,representinga58%reductioninclassiﬁcation er-ror overbaseline methods anda59% decreaseinerrorcompared tothebestpublishedresultsforthisdataset.Thispusheduscloser to the95% thresholdwhere adaptiveoperator augmentationmay becomefeasible.

There is an abundance of future work to be pursued in this area.Duetotimeconstraintsandcomputationalcomplexity,onlya selectnumberofdeeparchitectureswereexaminedduringthis re-search.AthoroughevaluationofdifferentdeepRNNarchitectures toinclude variationsin thedepth of hiddenlayer recurrent con-nections, stackingof differentsized LSTMlayers, andinterleaving fully-connectedfeedforwardlayers betweensequence-to-sequence recurrentlayersmayyieldadditionalimprovement.

Another enhancement for future work would be to include workloadtransitions.We believethat skewness andkurtosis may be relevant in datasets where the target workload is transition-ing across highand low workload conditions since distributional changesmayfirstbecomeevident inthetailsofthedistributions. Other ideas to pursueinclude creating ensembles of deepRNNs. Thiswouldalmostcertainlyimproveresultsaslongasenough di-versitycould be addedtotheensemble.In ourresearch,we only supplied30ssequencestotherecurrentnetworks.Exploring vari-ationsintemporal length suppliedto a RNNto examine station-arity oftarget-to-feature mapping for workload estimation using electroencephalograph(EEG)wouldalso bean interesting subject forfuturework.DeepRNNarchitecturescouldalsobeusedto im-provecross-participantandcross-dayclassificationsimultaneously bytrainingandtestingonallparticipantsgroupedtogetherrather than individually. Finally, significant improvement could be real-izediftime-series dataaugmentation methods aredeveloped ca-pable of forcing a learned invariance to the sources of temporal non-stationarity.

Acknowledgment

WethankJustin Esteppandthe AirForce ResearchLaboratory 711th Human Performance Wing (711HPW/RHCPA) for providing thedata fromtheir MATBexperiment andother support forthis research. The views and conclusionscontained in this document are those of the authors and should not be interpreted as rep-resentingthe oﬃcial policies,either expressed orimplied, of the UnitedStatesAirForceortheU.S.Government.

References

[1] R.G.Andrzejak,K.Lehnertz,F.Mormann,C.Rieke,P.David,C.E.Elger, Indi-cations ofnonlinear deterministicandﬁnite-dimensional structuresintime seriesofbrainelectricalactivity:dependenceonrecordingregionandbrain state,Phys.Rev.E64(6)(2001)061907.

[2] P.Bashivan,I.Rish,M.Yeasin,N.Codella,LearningRepresentationsfromEEG withDeepRecurrent-ConvolutionalNeuralNetworks.Internationalconference onlearningrepresentations(2016).

[3] M. Binz, S.Otte, A.Zell, Ontheapplicabilityofrecurrent neural networks forpatternrecognitioninelectroencephalographysignals,in:WorkshopNew ChallengesinNeuralComputation2015,2015,p.85.

[4] B.Blankertz,G.Dornhege,M.Krauledat,K.-R.Müller,G.Curio,The non-inva-siveberlinbrain–computerinterface:fastacquisitionofeffectiveperformance inuntrainedsubjects,Neuroimage37(2)(2007)539–550.

[5] A.J.Casson,Artiﬁcialneuralnetworkclassiﬁcationofoperatorworkloadwith anassessmentoftimevariationand noise-enhancementtoincrease perfor-mance,Front.Neurosci.8(2014)372,doi:10.3389/fnins.2014.00372. [6] F.Chollet,Keras,2015,(https://github.com/fchollet/keras).

[7] J.C.Christensen,J.R.Estepp,Coadaptiveaidingandautomationenhance opera-torperformance,Hum.Factors(2013)965–975.

[8] J.C.Christensen,J.R.Estepp,G.F.Wilson,C.A.Russell,Theeffectsofday-to-day variabilityofphysiologicaldataonoperatorfunctionalstateclassiﬁcation, Neu-roimage59(1)(2012)57–63,doi:10.1016/j.neuroimage.2011.07.091.

[9] M.X.Cohen,AnalyzingNeuralTimeSeriesData:TheoryandPractice,TheMIT Press,2014.

[10]J.R.Comstock,R.J.Arnegard,TheMulti-attributeTaskBatteryforHuman Oper-atorWorkloadandStrategicBehaviorResearch,NASATechnicalMemorandum 104174,1992.January

[11]M.L.Cummings,A.Clare,C.Hart,Theroleofhuman-automationconsensusin multipleunmannedvehiclescheduling,Hum.Factors(2010).

[12] P.R.Davidson,R.D.Jones,M.T.Peiris,Eeg-basedlapsedetectionwithhigh tem-poralresolution,IEEETrans.Biomed.Eng.54(5)(2007)832–839.

[13]J.R.Estepp,S.L.Klosterman,J.C.Christensen,Anassessmentofnon-stationarity inphysiologicalcognitive stateassessment usingartiﬁcialneural networks, ProceedingsoftheAnnualInternational Conferenceofthe IEEEEngineering

(10)

inMedicineandBiologySociety,EMBS2011(2011)6552–6555,doi:10.1109/ IEMBS.2011.6091616.URL:http://www.ncbi.nlm.nih.gov/pubmed/22255840 [14] F.A.Gers,J.Schmidhuber,F.Cummins, Learningtoforget:continualprediction

withlstm,NeuralComput.12(10)(2000)2451–2471.

[15] I.Goodfellow,Y.Bengio,A.Courville,DeepLearning,MITPress,2016. [16] A.Graves,Neuralnetworks,in:SupervisedSequenceLabellingwithRecurrent

NeuralNetworks,Springer,2012,pp.15–35.

[17]A.Graves,A.-r.Mohamed,G.Hinton,Speechrecognitionwithdeeprecurrent neuralnetworks,in:2013IEEEInternationalConferenceonAcoustics,Speech andSignalProcessing,IEEE,2013,pp.6645–6649.

[18] N.F.Güler,E.D.Übeyli,I. Güler, Recurrent neuralnetworks employing Lya-punovexponentsforEEGsignalsclassiﬁcation,ExpertSyst.Appl.29(3)(2005) 506–514.

[19] R.G.Hefron,B.J. Borghetti, Anew featurefor cross-daypsychophysiological workloadestimation, in:Machine Learningand Applications (ICMLA),2016 15thIEEEInternationalConferenceon,IEEE,2016,pp.785–790.

[20]G.E.Hinton,N.Srivastava,A.Krizhevsky,I.Sutskever,R.R.Salakhutdinov, Im-proving neural networks by preventingco-adaptation offeature detectors, arXiv:1207.0580(2012).

[21]S.Hochreiter,J.Schmidhuber,Longshort-termmemory,NeuralComput.9(8) (1997)1735–1780.

[22]B.M.Huey,C.D.Wickens,etal.,WorkloadTransition:Implicationsfor Individ-ualandTeamPerformance,NationalAcademiesPress,1993.

[23]D.W.Jahns,Operatorworkload:whatisitandhowshoulditbemeasured,in: K.D.Cross,J.J.McGrath(Eds.),CrewSystemDesign.Proceedingsofan Intera-gencyConferenceonManagementandTechnologyintheCrewSystemsDesign Process,SantaBarbara,California,1973.

[24]G. James, D. Witten, T.Hastie, R. Tibshirani, An Introduction to Statistical Learning,vol.112,Springer,2013.

[25]D.Kingma,J.Ba,Adam:amethodfor stochasticoptimization,International conferenceonlearningrepresentationsarXiv:1412.6980(2014).

[26]A.Krizhevsky,I.Sutskever,G.E.Hinton,Imagenetclassiﬁcationwithdeep con-volutionalneuralnetworks,in:F.Pereira,C.J.C.Burges,L.Bottou,K.Q. Wein-berger(Eds.),AdvancesinNeuralInformationProcessingSystems25,Curran Associates,Inc.,2012,pp.1097–1105.

[27]S.P.Kumar, N. Sriraam, P.Benakop, B.Jinaga, Entropiesbaseddetection of epilepticseizureswithartiﬁcialneuralnetworkclassiﬁers,ExpertSyst.Appl. 37(4)(2010)3284–3291.

[28]J.D. Lee, K.A. See, Trustinautomation: Designing for appropriatereliance, Hum.Factors46(1)(2004)50–80.

[29]Z.C.Lipton,J. Berkowitz,C.Elkan,Acriticalreviewofrecurrentneural net-worksforsequencelearning,arXiv:1506.00019(2015).

[30]S.R.Liyanage,C.Guan,H.Zhang,K.K.Ang,J.Xu,T.H.Lee,Dynamicallyweighted ensembleclassiﬁcationfornon-stationaryeegprocessing,J.NeuralEng.10(3) (2013)036007.URL:http://stacks.iop.org/1741-2552/10/i=3/a=036007

[31]R.Parasuraman,T.Bahri,J.E.Deaton,J.G.Morrison,M.Barnes,Theoryand De-signofAdaptiveAutomationinAviationSystems,TechnicalReport,DTIC Doc-ument,1992.

[32]F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel,P.Prettenhofer,R.Weiss,V.Dubourg,etal.,Scikit-learn:machine learninginpython,J.Mach.Learn.Res.12(Oct)(2011)2825–2830.

[33]V. Pham, T. Bluche, C. Kermorvant, J. Louradour, Dropoutimproves recur-rentneuralnetworksforhandwritingrecognition,in:Frontiersin Handwrit-ingRecognition (ICFHR),201414thInternationalConference on,IEEE, 2014, pp.285–290.

[34]M.E.Raichle,A.Z.Snyder,Adefaultmodeofbrainfunction:abriefhistoryof anevolvingidea,Neuroimage37(4)(2007)1083–1090.

[35]W.B.Rouse,W.R.Cody,K.R.Boff,Thehumanfactorsofsystemdesign: under-standingand enhancingthe roleofhumanfactorsengineering,Int.J.Hum. FactorsManuf.1(1)(1991)87–104,doi:10.1002/hfm.4530010108.

[36]V.Srinivasan,C.Eswaran,N.Sriraam, Approximateentropy-basedepilepticeeg detectionusingartiﬁcialneuralnetworks,IEEETrans.Inf.Technol.Biomed.11 (3)(2007)288–295.

[37]N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout:asimplewaytopreventneuralnetworksfromoverﬁtting.,J.Mach. Learn.Res.15(1)(2014)1929–1958.

[38]TheanoDevelopmentTeam,Theano:aPythonframeworkforfastcomputation ofmathematicalexpressions,arXiv:1605.02688(2016).

[39]E.D. Übeyli, Analysis of eeg signals by implementing eigenvector meth-ods/recurrentneuralnetworks,DigitalSignalProcess.19(1)(2009)134–143. [40]O.Vinyals,A.Toshev,S.Bengio,D.Erhan,Showandtell:aneuralimage

cap-tiongenerator,in:ProceedingsoftheIEEEConferenceonComputerVisionand PatternRecognition,2015,pp.3156–3164.

[41]G.Wilson,In-ﬂightpsychophysiologicalmonitoring,Progr.AmbulatoryMonit. (2001)435–454.

[42]G.F.Wilson,Psychophysiologicaltestmethodsandprocedures,in:Handbook ofHumanFactorsTestingandEvaluation,2002,pp.127–156.

[43]G.F.Wilson,C.Russell,J.Monnin,J.Estepp,J.Christensen,Howdoesday-to– dayvariabilityinpsychophysiologicaldataaffectclassiﬁeraccuracy?in: Pro-ceedingsoftheHumanFactorsandErgonomicsSocietyAnnualMeeting,vol. 54,SAGEPublicationsSageCA:LosAngeles,CA,2010,pp.264–268. [44]G.F.Wilson,C.A.Russell,Real-timeassessmentofmentalworkloadusing

psy-chophysiologicalmeasuresandartiﬁcialneuralnetworks.,Hum.Factors45(4) (2003)635–643,doi:10.1518/hfes.45.4.635.27088.

[45]G.F.Wilson,C.A.Russell,Performanceenhancementinanuninhabitedair vehi-cletaskusingpsychophysiologicallydeterminedadaptiveaiding.,Hum.Factors 49(6)(2007)1005–1018,doi:10.1518/001872007X249875.

[46]W.Zaremba,I.Sutskever,O.Vinyals,Recurrentneuralnetworkregularization, arXiv:1409.2329(2014).