Contents lists available at ScienceDirect
Pattern
Recognition
Letters
journal homepage: www.elsevier.com/locate/patrec
Predicting
sex
as
a
soft-biometrics
from
device
interaction
swipe
gestures
✩
Oscar
Miguel-Hurtado
a,∗,
Sarah V.
Stevenage
b,
Chris
Bevan
c,
Richard
Guest
aaSchoolofEngineeringandDigitalArts,UniversityofKent,Canterbury,UK bDepartmentofPsychology,UniversityofSouthampton,Southampton,UK cCREATELaboratory,UniversityofBath,Bath,UK
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received15October2015 Availableonline17May2016
Keywords: Soft-biometrics Sexprediction Swipegestures Featureselection Classifiers
a
b
s
t
r
a
c
t
Touchand multi-touchgesturesarebecomingthemostcommonwaytointeractwithtechnologysuch assmartphones,tabletsandothermobiledevices.Thelatesttouch-screeninputcapacitieshave tremen-douslyincreasedthequantityandqualityofavailablegesturedata,whichhasledtotheexplorationof itsuseinmultiple disciplinesfrompsychologytobiometrics.Followingresearchstudies undertakenin similar modalitiessuch askeystroke andmouse usagebiometrics, thepresent work proposestheuse ofswipegesturedataforthepredictionofsoft-biometrics,specificallytheuser’ssex.Thispaperdetails the softwareand protocol used forthe datacollection,the feature setextracted and subsequent ma-chinelearninganalysis.Withinthisanalysis,theBestFirstfeatureselection techniqueandclassification algorithms(naïveBayes,logisticregression,supportvectormachineanddecisiontree)havebeentested. Theresultsofthisexploratoryanalysishaveconfirmedthepossibilityofsexpredictionfromtheswipe gesture data,obtaininganencouraging 78%accuracyrate usingswipe gesturedatafromtwodifferent directions.Theseresultswillhopefullyencouragefurtherresearchinthisarea,wherethepredictionof soft-biometricstraitsfromswipegesturedatacanplayanimportantroleinenhancingtheauthentication processesbasedontouch-screendevices.
© 2016TheAuthors.PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Soft-biometricstraitsaredefinedas“anatomicalorbehavioural characteristics that provides some information about the iden-tityof aperson, butdoesnot providesufficient evidence to pre-ciselydeterminetheidentity”[1].Theyincludecharacteristicssuch as age, ethnicity, sex1, height, weight, scars and tattoos. These traits have been used within biometrics deployment in combi-nation with hard-biometrics modalities such as fingerprint [2], iris [3] andface [4]. Studies have shown that the use of a soft-biometrics can enhance biometrics system performance and can greatly decrease search time in large databases [1]. Whilst soft-biometricsarehighlevelcuesandassucharelargelyincapableof
✩ ThispaperhasbeenrecommendedforacceptancebyMariaDeMarsico.
∗ Correspondingauthor.Tel.:+441227764000;fax:+441227764000.
E-mail addresses: [email protected], [email protected] (O. Miguel-Hurtado),[email protected](S.V. Stevenage), [email protected] (C.Bevan),[email protected](R.Guest).
1Sexpredictioniscommonlynamed“genderprediction” withinthe biometrics
community.Sexreferstothe biologicalcharacteristicofaperson, whilegender referstothesocioculturalroles.Duetothis,theauthorsprefertheuseof“sex prediction”.
differentiatingoneindividualfromanother,theirusehasbeen pro-posedfordeployment within continuousauthenticationscenarios [5].Inthesecases,hard-biometricstechniquesare usedforinitial authentication, while a combination of soft-biometrics traits are usedtocontinuouslyauthenticatethesubject.
In parallel, there is also a growing interest in using soft-biometricsinnon-biometricsscenarios. TheHuman-Computer In-teraction (HCI)community are looking tothe prediction oftraits (e.g.sex,age,handedness, emotionalstates) asawayto enhance theinteractionbetweencomputer-basedsystemsandusers[6].For example,theuseofdynamickeystrokeandmousemovement in-formation has been suggested as a way to predict the level of stressofcomputerusers[7].Therehasalsobeenattemptsto pre-dict emotional states such as happiness fromdynamic keystroke data[8].Thesekindofpredictionsmayabletoprovide enhanced interfacestailoredtospecificdemographicgroupsand/ordifferent emotionalstates.
Human-computer interaction based on touch-screen technol-ogy can be dated back to 1965 when Johnson published his work [9] using wires on a CRT device. Recently, the growth of mobile technologies such as smartphone has resulted in the ubiquity of touch as an interface methodology. Moreover, the http://dx.doi.org/10.1016/j.patrec.2016.04.024
hasnotbeenanypreviousrelatedwork insoft-biometrics predic-tionbasedontouchgesturesdata.
Inananalogouswaytohowkeystrokeandmousedatahashave been proposedto enhance thehuman computer interaction[7,8], webelieve thatitispossibletoextractsoft-biometricstraitsfrom swipe gesture information. The predicted soft-biometrics traits can allow touchscreen computers-based systems to tailor their interactiontobettersuittheuser’scharacteristics.Inaddition,this informationcouldalsoimprovetheperformanceofcontinuous au-thenticationbiometricssystemsdeployedintouchscreendevices.
Taking into account the aforementioned potential uses of soft-biometrics predictions from touch gesture data within both biometricsandHCIfields,theworkpresentedinthispaper analy-sesthepossibilityofpredictinguser’ssexfromswipegesturedata captured via a smart phone. In particular we assess how differ-ences in swipe direction affect prediction performance, indicting relevant swipe gesture features for user’s sex prediction and which classification algorithm best suits thisdata. Moreover, sex predictionfusionschemesbasedonindividualswipedirectionare analysed.
2. Literaturereview
Swipe gesture-related techniques have been proposed in sev-eralstudies asanauthenticationmethodinorderto enhancethe access oftouch-screen devices.In one ofthe first studies in this field,Jermynetal[10]proposedtheuseofthegraphicalpassword to replace the text-based password.This work wasmotivated by the graphical input capabilitiesof the firstpersonal digital assis-tantdevices(PDAs).
Morerecently,in[11]theauthorscombinetheideaofauseof Androidlockpatternauthenticationwiththedynamicsofdrawing such a pattern. The Android lock pattern authentication allows userstounlocktheir devicesbydrawingwiththefingerapattern on a 3×3grid of9 circlesdisplayed onthe device touch-screen. The authors proved that security can be significantly enhanced by using thedynamic informationof thefinger movement while drawing this pattern. The use of touch graphical passwords for tablets with multi-touch-screens has been also analysed in [12], where the dynamic information of multi-touch input sequence were analysed for user authentication. In this work, the authors report an equal error rate of 10% using a single multi-touch gesture.ThisworkhasalsobeenusedtoobtainaUSPatent[14].
In [13] the authors explore the possibilitiesof swipe gesture data (combined with accelerometer data) to perform continuous useridentificationformobiledevices.Inthiswork,theinteraction withthemobiledevice wasanalysedaccordingtothreegestures: Tap, Scroll andFling.The touch-based features include touch co-ordinatesonthescreen, touchpressureanddurationacrossthree different apps (Message, Album andTwitter) with100 users in-teracting with the smartphones. The results show 80% accuracy with10 interaction foridentifying anon-owner using the smart-phone, and nearly 100% identification accuracy for the owner of thesmartphonewithinsixinteractions.
Anotherexampleofcontinuousauthenticationusingswipe ges-turecan be found at[15],wherethe authorspresented theFAST (Finger-gesturesAuthentication SystemusingTouchscreen) frame-work. In this work the six most frequently used swipe gestures wereassessed:down-to-upswipe,up-to-downswipe,left-to-right swipe,right-to-leftswipe,zoomin,andzoomout.Acrossmultiple
In 1997, Wayman [16] proposed the use of soft-biometrics in order to optimise search in large surveillance databases. Since then, soft-biometrics have continuously received the attention of the biometrics research community [1]. Studies have shown the benefits of soft-biometrics traits in combination with biometrics modalities to improve both identification/verification algorithm performances and computational search times. Soft-biometrics data can help improve biometrics systems [17] and also can be usedtotailoredapplicationsandtheinformationdisplayedbased on user’s characteristics. Thus, the prediction of soft-biometrics, specifically a subject’s sex, have been thoroughly analysed in severalbiometrics.
Faceimages havebeencomprehensively analysed forsex pre-diction[18],reachingaccuracy ratesof99% usingfrontalface im-ages anda support vector machine (SVM) classifier [19]. In[20], the authorsanalysed sex prediction fromunconstrained face im-ages in which there was a high level of variation in viewpoint, pose,articulationandocclusionfromtheimages,simulatinga per-sonal photoalbum. The proposed methodachieved 82% accuracy forsexpredictionusingPoselet-levelfeaturesandSVMclassifiers. Gait-basedsexpredictionhas alsoobtainedahighaccuracy rate: 98%,basedonvideosequencesofapersonwalkingfrom11 differ-entcameraviewsinthesamescene[21].
In a studyby Amayeh et al.[22] the authors examined hand features toobtain sex predictionasa soft-biometric.Handswere analysed using image processing techniques and three different machine learningclassifiers: minimum distance,k-nearest neigh-boursandlineardiscriminant analysis.98% accuracyratewas ob-tained by score-level fusion usingMPEG-7 Fourier descriptors as thehandfeaturesandlineardiscriminantanalysisastheclassifier
Biometric modalities such as iris or fingerprint, not naturally linkedwitheveryday sexclassification,haveobtainedreliable ac-curacy rates.From iris images, it hasbeen possible to obtain an accuracyrateof91%analysingiristextureusingalocalbinary pat-ternandSVMclassifier[23].In[24]amethodusedonfingerprint images based on discrete wavelets transform and singular value decompositionisproposed.UsingaK-nearest neighbourasa clas-sifier,88%accuracyisreported.
More aligned to the presentwork are studies undertaken on mouseandkeystroke interaction.Theseare used asthemain in-teractionwithmanycomputer-basedsystems.Idrusetal.[25] con-ductedathoroughanalysisforprofilingusersfromkeystrokedata. In this work, the authors analysed the possibility of predicting whether the user: (a) used one or two hands to type; (b) be-longstoaparticularagecategory;(c)ismaleorfemale;and(d)is right-orleft-handed.Usingfourfeatures basedon digraphs(two consecutivekeystrokes)andSVM classifiers,theauthors achieved recognitionratesforfreetextof>90%forwhethertheusertyped withoneortwo hands,between79% and84%foruser’ssex,72– 75% forage categoriesand 83–88% forhandedness. Furthermore, recentstudieshavefocusedonpredictingemotionalstatessuchas stress[7,8]andhappiness[8]obtainingpromisingresults.
Tothebestofourknowledge,therehasnotbeenpreviouswork analysingthepossibilitiesofsoft-biometricspredictionfromswipe gesturedata.
3. Methodology
This section describes the steps undertaken within our ex-perimental methodology from swipe data acquisition to subject
Fig.1. Softwareforswipegesturedataacquisition.
sexprediction. It will detail how the data were collected, which featureswereextractedfromtherawdata.Itwillalsodetailhow those features were used in combination with feature selection techniquesand machine learningalgorithms in orderto perform thesexpredictionofthemobileusers.
3.1.Swipegesturedatacollection
Theswipegesturedatausedinthisstudyweretakenfromthe SSD[26]dataset.TheSSDisacomprehensivemulti-modal biomet-ricsdataset(face,iris,swipe,keystroke,signature,gait,hand,voice and fingerprint) along with demographics (such as sex, height, weight,handedness)createdfortheSuperIdentityproject[27].The SSD contains 116participants, with an even sex distribution (57 malesand59females)andagesrangingfrom18to35years.
The swipe gestures data were captured using a Samsung GT-I9100 ‘Galaxy S2’ smartphone. The GT-I9100 has a 4.3 capaci-tivetouchscreendisplay(W480xH800pixels,219dpi).Participants wereinstructedtooperatethesmartphoneone-handed(according totheirpreference)inportraitorientation,usingthethumbofthe samehandtointeractwiththescreen(Fig.1).
The swipe gestures data were captured using a Samsung GT-I9100‘Galaxy S2’smartphone.The GT-I9100hasa 4.3capacitive touchscreendisplay(W480xH800pixels,219dpi).No screen pro-tectorsor external cases were used. Participants were instructed to operate the smartphone one-handed (according to their pref-erence)inportraitorientation, usingonlythethumbofthe same handtointeractwiththescreen(Fig.1).Theswipegestureswere capturedusinganAndroidOSapplicationthatwascustombuiltfor thispurpose.Theapplicationdetected andautomaticallyrecorded allswipe gesturesmade infourdirections (left-to-right, right-to-left, up-to-down and down-to-up). To elicit swipe gestures, the captureapplicationuseda simple readingtask.Participants were presentedwithaseriesofshortjokes(inrandomorder)presented asslides. Tocomplete thetask(to read each jokeandits punch-line),participantswererequiredtoperformaswipegestureinthe directionindicated onthe screen.Each participantsubmitted120 swipegesturesintotal,dividedevenlyacrossthefourdirections.
Rawdatawererecordedabouteachswipeandwerestored lo-cally in a text file. This included xand y positional information, pressureandthicknesstimeseries.
3.2.Featureextraction
After the collection of the dataset, a pre-processing step was performedin ordertoremove swipeswhichweren’tproperly ac-quired (due to software or user input errors) or were too short
Fig.2. Swipefeaturedetails.
Table1
Swipefeatureset.
# Description # Description
1 Totallength(px) 8 Maximaspeed(px/ms) 2 Totaltime(ms) 9 Averagespeed(px/ms) 3 Width(px) 10 Maximaacceleration(px/ms2)
4 Height(px) 11 Averageacceleration(px/ms2)
5 Area(px2) 12 Averagearcdistance(px)
6 Averagethickness(px) 13 Maxarcdistance(px) 7 Averagepressure 14 Anglestarttoend(degrees)
forfeaturesextraction(fewerthan4samplepoints).Fromtheraw data,instantaneousswipespeedandaccelerationwere derivedas first and second time derivatives. The arc distance between the swipe gesture and an imaginary line joining the start and end pointwasalsocalculated.Fig.2depictstheseandotherextracted geometryfeaturessuchasswipeheight,widthandarea.
From the time series (x and y position, speed, acceleration, pressure, thickness andarc distance) 14 features were extracted, which are detailedinTable 1 along withtheir units.Screen pix-els (px) isused as screenmovements unit and milliseconds (ps) fortimedifferences.Androidsystemprovidesthethickness,asthe approximatesize of the touchcontact area, alsoin pixels, whilst pressuredisprovidedwithoutanyunitscale:
These 14 features were extracted from each swipe, and each feature averaged across all samples from each subject for each swipedirection(left-to-right,right-to-left,up-to-downand down-to-up),resultingin4×14featuresforeachsubject.
3.3. Featuredifferencesbetweenmaleandfemalegroups
AWilcoxon rank-sumtest analysis wasperformedinorder to determine whether mean feature values from male and female groups were significantly different. The Wilcoxon rank-sum test wasusedasitwasnot possibletoassume thatfeaturevaluesare normallydistributed.Thisassumptionwasmadeafterperforming aLillieforsteststothedistributions.
TheWilcoxonrank-sumtestrevealeddifferencesbetweenmale andfemalepopulationsinseveralswipefeatures,specificallyinthe down-to-updirection:Width,AreaandAngleStarttoEnd;andin theleft-to-right: TotalTime, Averagespeed, AverageArc Distance andMaxarcDistance.Forthetworemainingdirections,the up-to-downdirectiononlyshowedsignificancedifferencesintheWidth feature,whereastheright-to-leftdirectionfailedtoshowany sig-nificancedifference.
These results are comparable to those found in [28] using a similar swipegesturedataset.In[28],theauthorsconcluded that swipefeatures arelinkedtospecificphysicalcharacteristicsofthe hand. Users withlonger thumbs performed swipe gestures with higher speed and acceleration and, therefore, shortercompletion times.This work alsohighlighted that, on average,male subjects havelonger thumbsthan females. Basedon thesedifferences be-tween male and female populations, this work analyses the use ofpredictiontoolssuch asfeatureselection techniquescombined withmachinelearningclassifiersinordertofindthebest combi-nationoffeaturesforthesexpredictionbasedonswipegestures.
hasbeensuccessfullyusedinabroadrangeofscientificfieldsfrom speechsegmentation[30]tomedicaldomains[31].
Our method employed an initial feature selection step to identifyandselectthemostpromisingfeaturesforsexprediction. After this selection, the predictive power of these feature sets was analysed through 10-fold cross validation using, separately fourmachinelearningclassifiers.Thesestepsareexplainedinthe following subsection,indicatingthealgorithm settingvaluesused toensurereproducibility.
3.4.1. Featureselection
FeatureselectionhasbeencarriedoutinWEKAv3.7usingthe Classifiersubsetevaluator(“ClassifierSubsetEval v1.0.4)in combi-nationwiththe BestFirst [32] attribute selection implementation. Both,theclassifiersubsetevaluatorandtheBestFirstfeature selec-tionalgorithm(includedwithinWEKA),havebeenusedwiththeir defaultsettingvalues.
TheBestFirst featureselectionalgorithm searchestheattribute space greedilyinone of threepossible directions:forward, back-ward or bidirectional. Our experimentation used all three direc-tionsindependentlyinordertoidentifyanoptimalselection.
Asan exploratoryanalysis,anddueto theWEKA implementa-tion ofthefeature selection algorithms, allthe samplesfromthe datasetwereusedatthefeatureselectionstep.
Thefeatureselectionstepwillidentifythemostpromising fea-turesubsetsforeachclassifierandforeachfeatureselectionsearch direction.Thesefeaturesubsetswillbeusedtocreatethesex pre-dictionmodelstoevaluatetheirsuccessratiosusing10folds cross-validation.
3.4.2. Machinelearningclassifiers
Theclassifiersusedinthisstudyhavebeenusedsuccessfullyin differentfieldsasMachineLearningengines[30,31,33].Four differ-entmachinelearningclassifiershavebeentestedaspossible candi-datestopredictthesexofthesubject.Thechosenclassifierscover a range of popular modes of classification: decision trees (J48), probabilistic(naïveBayes),supportvectormachines(SVM)and lo-gistic regression, and havebeen selected for complementarity in assessment.
Decision tree (J48): Decision tree learning is one of the most commonly used algorithms for automatic learning. The decision treesarecomposedofnodes(whichtestthevalueofanattribute), branches(path tofollowbasedontheattribute value)andleaves (whichprovidetheclassificationoftheinstance).Thedecisiontree employed inthis work is the C4.5implementation developed by Quinlan[34]implementedinWEKAastheJ48algorithm.
Thesettingsvaluesusedforthedecisiontreeclassifier(J48)are detailedinTable2:
Table2
Decisiontreesettingvalues.
Option Value Option Value
BinarySplits False SaveInstanceData False
CollapseTree True Seed 1
ConfidenceFactor 0.25 SubtreeRaising True
MinNumObj 2 Unpruned False
NumFolds 3 UseLaplace False
ReduceErrorPruning False UseMDLcorrection True
Cost 1.0 Nu 0.5
Degree 3 ProbabilityEstimates False
Eps 0.001 Seed 1
Gamma 0.0 Shrinking True
Table4
Multilinear logistic regression setting values.
Option Value Option Value MaxIts -1 Ridge 1.0E-8
Table5
NaïveBayessettingvalues.
Option Value Option Value
UseKernelEstimator False useSupervisedDiscretization False
Support vector machine (SVM): Support vector machines were introducedbyCortesandVapnik[35]in1995andhavebeen suc-cessfullyused ina widerangeof differentareas.SVMalgorithms arebasedonfindingtheoptimalseparatinghyperplanethat max-imizesthe margin,in other words,the hyperplanethat gives the largest minimum distance to the training examples. The imple-mentation usedin thisinvestigation is theLibSVM v1.0.6[36] as anadd-ontotheWEKAsystem.TheoptionsselectedforSVM clas-sifierwere,(Table3):
Multilinear logistic regression: Multilinear logistic regression [37]is one of the most commonlyused tools for discrete data anal-ysis.Multinomiallogisticregressionisusedtopredictthe probabil-ities ofthe different classes analysed givena set of independent variables. It represents a particular solution to the classification problemthat assumes that a linear combinationof the observed featurescanbeusedtodeterminetheprobabilityofeach particu-laroutcomeofthedependentvariable.
Table4detailsthesettingvaluesusedforthemultilinear logis-ticregressionclassifier:
Naïve Bayes: The naïve Bayes classifier [38] is based on the probabilisticBayes’ruleandisparticularlysuitedwhenthe dimen-sionality oftheinputs ishigh. In orderto reduce thecomplexity ofthehighdimensionality,thenaïveBayesclassifierassumesthat theeffect of thevalue ofa particular feature on agiven class is independentofthevaluesoftheotherpredictors.Despiteits over-simplified andgenerally unrealistic assumptions, the naive Bayes classifier has been shownto perform remarkably well in a wide rangeof applicationssuch astext classification [38] andinternet trafficidentification[39].
This classifier can be found implemented in WEKA and was analysedwiththefollowingsettings,Table5:
3.4.3. Sexpredictionevaluation
The machine learning models (following the feature selection step) were created for the four different classifiers and the four swipedirections.Theevaluationofthemodelswasundertakenby meansof10-foldcross validation.Thismethodis amodel valida-tiontechniquetoestimatetheperformance ofastatistical predic-tionmodel.Thetechniquerandomlypartitionstheoriginalsample datainto 10equal sized subsamples. Nineof the subsamplesare usedfortrainingthemodelandtheremainingsubsamplesisused
Fig.3. Down-to-upsexpredictionscoredistribution.
Fig.4. Left-to-rightsexpredictionscoredistribution.
Fig.5. Right-to-leftsexpredictionscoredistribution.
asthevalidationdata fortestingthemodel.Toreduce variability, thisprocess isrepeated10 timesusingdifferentsubsamples data forvalidation.
The10-foldcrossvalidationwascarriedout25timesusing dif-ferentrandom seednumbers (values from1to 25,whichensure that fold observations are different on each evaluation) in order to obtain a statistically significant average of the model perfor-mance, withconfidenceintervals (
α
=0.05) forthe average accu-racyaround1%.4. Results
Theresultsobtainedfromtheevaluationofthemachine learn-ingmodelscreatedare presentedinthissection.In thefirst sub-sectionthesexpredictionperformanceforeachclassifierandeach swipedirectionisdisclosed.Followingthis,thescoredistributions for the best classifier of each direction are presented as an in-troductionto thelast subsection,whichattempts to enhancesex prediction performance by means of score and decision fusion techniques.
4.1.Individualdirectionssexpredictionperformance
Figs. 3–6, one figure for each swipe direction, show the per-formance of the four classifiers analysed using the feature set
Fig.6. Up-to-downsexpredictionscoredistribution.
selected by the BestFirst algorithm for the three different search directions.
Asit canbe seen inthisfigures,there arecleardifferencesin performancebetweenthefourswipegesturedirections. Down-to-upandleft-to-rightdirectionbothshowthehighestaccuracyrates ofaround71%,beingsignificantly higherthantheaccuraciesfrom right-to-leftandup-to-down,whichaveragearound65%. Down-to-up and left-to-right swipe gestures involve the extension of the thumb whilst up-to-down and right-to-left involve flexion. Sub-jects might be more stable in extension movements and, there-fore,theclassification ofthismovementsobtaina higher classifi-cationperformance.TheseresultsareconsistentwiththeWilcoxon rank-sumtest results.More significantdifferenceswere foundfor these two directions (down-to-up andleft-to-right), which could beexplainedbyagreaterdifferenceinswipegestureperformance betweenmalesandfemalesgroupsforthesetwodirections.
Regarding the mostsuitable classifier algorithms for sex pre-diction based on swipe gesture data, the multilinear logistic re-gressionclassifiershowsgoodaccuracyratesacrossallfourswipe directions. Only the naïve Bayes algorithm slightly bettered the logistic classifier for left-to-right swipe direction. Table 6 sum-marise thebest accuracy rates(and their averageaccuracy confi-denceinterval,CI)foreachswipegesturedirection,givingfurther details such thefeatureset usedandthefeature selection search direction.
The best classifier (71.8%accuracy), for the left-to-rightswipe gesturefeatures,isbasedonaveragethickness(featureID6), max-ima speed (feature ID 8) andthe average arc distance(feature ID 12). For thisswipe direction, women presented a higher average arcdistance(6.57pxforfemalescomparedwith5.21pxformales), slightlylowerthickness(40.6pxforfemalescomparedwith41.6px for males) and maxima speed (2.54px/ms for females compared with 2.77px/ms for males). Average thickness swipe gesture fea-ture wasalso selected within the feature set forthe four swipe directions. It is also worth highlighting that the average pressure (feature ID 7) wasselectedwithin three ofthe directions).These twofeatures,thicknessandpressure,arecloselyrelated.Thickness is the approximate size of the touch contact area. The pressure is not directly measured from the Android smartphone used in ourexperimentationasitisestimatedfromthesize ofthetouch point on the assumption that more pressure means your finger flattens out. Height (feature ID 4), area (feature ID 5) and max-imaacceleration(featureID10)wereincludedinbothdown-to-up andup-to-downswipegesturedirectionsmodelswhilstTotaltime (featureID2),Maximaspeed(featureID8)andaverageacceleration (featureID11)wereincludedintwoswipedirectionmodels.
In order to analyse the influence of each feature on the pre-dictionmodels,Table7detailsthecoefficientvaluesforthe multi-linearlogisticregressionmodels(naïveBayesmodelhasnotbeen includedduetolackofdetailedmodelinformationatWEKA ma-chinelearningsuite):
Up-to-down Logistic Bi-dir. 1,4,5,6,7,8,10,13 64.3±1.1%
Table7
Multilinear logistic regression coefficients for the sex predictionmodels. FeatureID DU RL UD 1Totallength – – 46.13 2Totaltime 10.8 -1.29 – 3Width – – – 4Height -11.55 – -48.39 5Area 2.64 – -0.18 6Averagethickness 15.09 20.27 12.13 7 Averagepressure -14.39 -20.15 -13.20 8 Maxima speed – – 3.03 9 Averagespeed 11.09 – – 10Maximaacceleration 4.04 – -3.55 11Averageacceleration -9.10 -1.68 0.90 12Averagearcdistance – – –
13Maxarcdistance – – –
14Anglestarttoend -1.0 – –
Fig.7. Down-to-upsexpredictionscoredistribution.
To calculate this coefficients, all the features have been nor-malised (mean 0 and standard deviation 0) to be able to comparethecoefficientsbetweeneachother.AtTable7,thehigher the valueofthe coefficients, thehigherthe“log odds” increment ofbeingfemale.Itisworthtohighlightthehighweightvaluesof “Totallength” and “Height” fortheup-to-downswipesex predic-tion model,beingremarkablehigherthanthe restofcoefficients. Itisalsointerestingthepositiveandnegativesignofthe“Average Thickness” and “Average Pressure” weights forthese threeswipe directions.Asmentionedbefore,thepressurevaluesareestimated by the android operating system from the thickness values. The differentsignsandthesimilarweightvalueswillmitigatethe over-all importance ofthesetwofeatures. Anincrease ofthe “Average Thickness” values willimply an increase ofthe“log odds” of be-ing female, however,it will also imply an increase of the “Aver-age Pressure” whichwill lead to a decrease ofthe “log odds” to be femaleduetothenegativesignofitsweight.Dueto the con-tradictory signofthesetwo values,theoverallimportance ofthe thicknessandpressurecanbeconsiderlow.
4.2. Directionscoredistributions
In Figs. 7–10, the sex prediction score distributions on each swipe gesture direction using the best classifier identified in Table6aredepicted.Thesamplescoresobtainedfromeachswipe directionclassifierrepresenttheprobabilityoftheinputsampleto belongtoafemaleuser,inascalefrom0to1.Ifthescoreisclose to 0,itmeans that theprobability ofthe inputsampleto belong
Fig.8. Left-to-rightsexpredictionscoredistribution.
Fig.9. Right-to-leftsexpredictionscoredistribution.
Fig.10. Up-to-downsexpredictionscoredistribution.
toamaleuserisveryhigh.Ontheotherhand,ifthescoreisclose to1,itwilllikelybelongtoafemaleuser.Thesefiguresshowmale score distribution inlight grey, female score distribution in dark grey, andhow they overlap inmid-grey. It can be seen how the distributionfromeachdirectionfollowsdifferentpatterns. Specifi-cally,itcanbeobservedhowdown-to-upscoregraphhasalower overlap betweenmen and women distribution. This overlap was thereason that thisdirection obtainedthe highestaccuracy per-formance(see Table6). Thesedifferenceswill be usedto analyse severalfusion techniques in order to improve the accuracy rates obtainedfromindividualswipegesturedirection.
4.3.Fusionschemesexpredictionsresults
Fusiontechniquesare commonlyusedinmulti-biometrics sys-tems [40]. These systems have been proven to obtain enhanced performance over individual modality biometrics system. Fusion canbeperformedatdifferentlevels:
(i) Featurelevel: wherethefeaturesfromdifferentbiometrics modalitiesare combined.The combinedfeatureset willbe usedasaninputtothematchingalgorithm.
(ii) Matchingscorelevel:wherethematchingscorelevelsfrom each biometrics modality are combined to obtain a single scorelevel.
Fig.11. Up-to-downsexpredictionscoredistribution.
(iii)Decision level: wherethe binarydecisionofthe classifica-tionalgorithmsarecombinedtoreachasingledecision. Inthiswork,matchingscorelevelanddecisionleveltechniques willbeanalysed. Thesefusiontechniqueshavebeen selecteddue totheirpopularityandsuccessinpreviousstudies[40],alongwith theirsimplicityinimplementation.
For matching scorelevel fusion, the following matching score levelruleshavebeenanalysedandappliedtothepredictionofsex basedonswipegesturescores:
(i) Weightedsumofscores:Sf usion=iwi·si
(ii) Productofscores:Sf usion=isi (iii)Maximumscore:Sf usion=max
{
si}
(iv)Minimumscore:Sf usion=min
{
si}
(v) Medianscore:Sf usion=median
{
si}
where wi is the weight of the ith swipe direction with
iwi=1 and si is the score obtained fromthe ith swipe
directionclassifier.
The decisionlevelapproachwasimplementedbyusingthe in-dividualclassifier swipegesture directiondecisions. Three differ-entvotingthresholdshavebeenanalysed:(i)atleastoneclassifier across the four swipedirections classified the subject asfemale, (ii)atleasttwo and(iii)atleast three.Otherwise,ifthe number of classifiers labellingthe subjectas a female is lower than the threshold,thesampleisconsideredfromamalesubject.
Fig.11showstheaccuracyobtainedwhenthescorelevelfusion techniqueswereappliedtotheswipegesturepredictionscores.
From all the possible weights combinations, the higher accu-racywasobtainedbytheweightedsumofscoresfusiontechnique, which achieves 77.1% of accuracy rate. The best combination of weightsfoundwas:
Sf usion=0.3·SUD+0.7·SLR
It can be seenhow thetwo best swipedirectionsinterms of accuracy:left-to-right (71.8%accuracy) andup-to-down (70%) ac-count forthe mostofthe score fusion,with a higherweight for thebestswipedirection.
Thiscombinationofswipedirectionscoresmeansan improve-mentof 5% compared withthe bestindividual swipe gesture di-rectionsexpredictionrate,71.8%.
The accuracyrates obtainedwhenusing thedecisionlevel fu-siontechniquesispresentedinFig.12.
This approach obtain a 78.2% of accuracy when two or more swipegesture direction agree on the sex of the users. This rate means an improvement of 6% from the best individual accuracy rate.
5. Conclusionandfuturework
Theincreasing adoptionoftouch-screendevicesandtheir con-tinuousdatacaptureenrichmentwill bringthe possibilityof col-lectinghighqualityswipegesturedatafromusersinteractionsand theopportunityofusingthesedatatopredictsoft-biometricssuch
Fig.12. Up-to-downsexpredictionscoredistribution.
sex,agecategory,singleor-twohandedusage,handednessoreven emotionprediction.
This soft-biometric information can be used to improve au-thenticationsystems(i.e.continuousauthenticationbasedon mo-bile devices use), to tailor applications interfacesto specific user groups, or to enhance the interaction between computer-based systemsandusers.
Followingtheseideas,thispaperhasanalysedthepossibilityof sexpredictionusingswipegesturedatacollectedfromtheuser in-teractionwithatouch-screendevice.Thisinteractioninvolvesthe placement ofa finger on the screenfollowing witha fast move-mentinonespecificdirections.Thesegesturesarefrequentlyused withtouch-screendeviceswhilenavigatingwebsites,listmenusor picturegalleries.
The results of this exploratory analysis have confirmed the possibilityofsexpredictionfromtheswipegesturedata,obtaining an encouraging 71% accuracy from an individual swipe gesture direction. Furthermore, the results have shown a significant dif-ference in the sex prediction power based on swipe directions. The swipe directions involving finger extensions (down-to-up andleft-to-right) obtained around 71%accuracy, while theswipe directions involving finger flexion (up-to-down and right-to-left) obtainedaround65%accuracy.
Regardingthemostsuitablemachinelearningclassifierforthis task,themultilinearlogistic regression hasshownagood perfor-manceacrossallswipegesturedirection,onlyslightlybetteredby thenaïveBayesclassifierforleft-to-rightswipedirection.
Wehavealsoanalysedsexpredictionaccuracywhencombining thedatafromthefourdirections.Severalscorelevelanddecision levelfusiontechniqueshavebeenimplemented.Theresultsofthis analysisshowedthat thecombinationofdirectionswipedata en-hanced thesex predictionaccuracy by 6%,achieving a78% accu-racyrateusingadecisionvotingscheme.
It is importantto acknowledge the limitationof this research todrawn generalconclusionsfromsmallsamplepopulations.Yet, itis valuableto acknowledgethiskindofinformationcan be ex-tractedfromswipegesturedatapointingtofurtherresearchinthis area. This will help theresearch communityto find waysto use thisinformation toenhance the userinteraction withtechnology andalsotoimprovecontinuousauthenticationonmobiledevices. Moreover,theacknowledgementofthepotentialpredictionofthis kindofinformationcanbeessential toprevent thisleakof infor-mationwhenitcouldimplyaprivacyrisk.
Theseresultswillhopefullyencouragefurtherresearchonthis area.Predicting soft-biometricsinformationfromswipegestureis a new field in biometricsand Human-Computer interaction field and,therefore,thereisscopeforimprovementsinalldataanalysis steps.Specificallyfortheworkpresentedinthispaper,theanalysis ofsexpredictionbasedonswipegesturedata,thefollowingareas shouldbeinvestigated:
- Featureextraction:theswipegestureisamulti-dimensional se-quenceofnumericaldatapoints. Thischaracteristicallows the creation of new swipe features that can be analysed to
im-- Classifiers: specific parameterisation of classifiers could also bringimproved accuracies. Furthermore,theuse ofensembles ofclassifiers could be another option forthe improvementof theperformance.
- Swipe gesture directions: thorough analysis of the difference betweenswipefeaturesfromdifferentswipegesturedirections and how they impact on the sex predictionperformance can enablebetterfusionstrategiesofswipefeaturesfromdifferent directions.
Moreover,thepredictionofother soft-biometricstraitssuch as agecategories,handedness,stress detectionandemotion recogni-tionareotherareaswhereswipegesturedatacouldleadtoresults withapracticalapplications.
Acknowledgments
This work was supported by UK EPSRC Grant EP/J004995/1 (SID: An ExplorationofSuperIdentity). The authorswould liketo thankcolleaguesonthisgrantforhelpfulcommentsonthiswork.
References
[1]K.Nandakumar,A.K.Jain,Softbiometrics,in:S.Li,A.Jain(Eds.),Encyclopedia ofBiometrics,SpringerUS,Boston,MA,2009,pp.1235–1239.
[2]H. Ailisto, etal., Softbiometrics—combiningbodyweight and fat measure-ments with fingerprint biometrics,Pattern Recognition Lett. 27 (5) (2006) 325–334.
[3]R.Zewail,etal.,Softandhardbiometricsfusionforimprovedidentity verifi-cation,in:Proceedingsofthe200447thMidwestSymposiumonCircuitsand Systems,1,IEEE,2004.
[4]U.Park,A.K.Jain,Facematchingandretrievalusingsoftbiometrics,IEEETrans. Inf.ForensicsSecur.5(3)(2010)406–415.
[5]K.Niinuma,U.Park,A.K.Jain,Softbiometrictraitsforcontinuoususer authen-tication,IEEETrans.Inf.ForensicsSecur.5(4)(2010)771–780.
[6]A.Kolakowska,Areviewofemotionrecognitionmethodsbasedonkeystroke dynamics and mouse movements,in: Proceedings ofthe 6th International ConferenceonHumanSystemInteraction(HSI),IEEE,2013.
[7]J.Hernandez,etal.,Underpressure:sensingstressofcomputerusers,in: Pro-ceedingsoftheSIGCHIConferenceonHumanFactorsinComputingSystems, ACM,2014.
[8]M.Fairhurst,L.Cheng,E.Meryem,Exploitingbiometricmeasurementsfor pre-dictionofemotionalstate:Apreliminarystudyforhealthcareapplications us-ingkeystrokeanalysis,in:BiometricMeasurementsandSystemsforSecurity andMedicalApplicationsProceedings,Workshopon.IEEE,2014.
[9]E.A.Johnson,Touchdisplay—anovelinput/outputdeviceforcomputers, Elec-tron.Lett.1(8)(1965)219–220.
[10]W.Hu,X.Wu,G.Wei,Thesecurityanalysisofgraphicalpasswords,in: Pro-ceedingsofCommunicationsandIntelligenceInformationSecurity(ICCIIS), In-ternationalConferenceon,IEEE,2010.
[11]J. Angulo, E.Wästlund, Exploringtouch-screen biometricsfor user identifi-cationonsmartphones,PrivacyandIdentityManagementforLife,Springer, BerlinHeidelberg,2011,pp.130–143.
[12]N.Sae-Bae,N.Memon,K.Isbister,Investigatingmulti-touchgesturesasanovel biometricmodality,in:ProceedingsofIEEEFifthInternationalConferenceon Biometrics:Theory,ApplicationsandSystems,2012.
[16] J.L.Wayman,Large-scalecivilianbiometricsystems-issuesandfeasibility,in: ProcedingsofCardTech/SecurTechID,732,1997.
[17]A.K.Jain,S.C.Dass,K.Nandakumar,Softbiometrictraitsforpersonal recog-nition systems, BiometricAuthentication,Springer, Berlin Heidelberg,2004, pp.731–738.
[18] C.B.Ng,Y.H.Tay,B.-M.Goi, Recognizinghumangenderincomputervision:a survey,PRICAI2012:TrendsinArtificialIntelligence,Springer,Berlin Heidel-berg,2012,pp.335–346.
[19] J.Zheng,B.L.Lu, Asupportvectormachineclassifierwithautomaticconfidence and itsapplication togenderclassification,Neurocomputing 74(11) (2011) 1926–1935.
[20]L.Bourdev,S.Maji,J.Malik,Describingpeople:Aposelet-basedapproachto at-tributeclassification,in:ProceedingsofInternationalConferenceonComputer Vision(ICCV),2011.
[21]F.B.Oskuie,K.Faez,Genderclassificationusinganovelgaittemplate:radon transform of mean gait energy image, Image Analysis and Recognition, Springer,BerlinHeidelberg,2011,pp.161–169.
[22]G.Amayeh,G.Bebis,M.Nicolescu,Genderclassificationfromhandshape,in: ProceedingsofIEEEComputerSocietyConferenceonComputerVisionand Pat-ternRecognitionWorkshops,2008.
[23]J.E.Tapia,C.A.Perez,K.W.Bowyer,GenderClassificationfromIrisImages Us-ingFusionofUniformLocalBinaryPatterns,in:ProceedingsofComputer Vi-sion-ECCV2014Workshops,SpringerInternationalPublishing,2014. [24]P.Gnanasivam,S.Muttan,Fingerprintgenderclassificationusingwavelet
trans-formandsingularvaluedecomposition,Int.J.Comput.Sci.Iss.9(3)(2012). http://arxiv.org/abs/1205.6745.
[25]S.Idrus,S.Zulkarnain,etal.,Softbiometricsforkeystrokedynamics:Profiling individualswhiletypingpasswords,Comput.Secur.45(2014)147–155. [26]R.Guest,etal.,BiometricswithintheSuperIdentityproject:anewapproach
tospanningmultipleidentitydomains,in:Proceedingsof2014International CarnahanConferenceonSecurityTechnology(ICCST),IEEE,2014.
[27]S.M.Black,etal.,“Superidentity:Fusionofidentityacrossrealandcyber do-mains.” in: ID360: Global Identity (2012).
[28]C.Bevan,D.S.Fraser, Differentstrokesfordifferentfolks?Revealingthe phys-ical characteristics of smartphone users from their swipe gestures, Int.J. Human-Comput.Stud.88(2016)51–61.
[29]M.Hall,etal.,TheWEKAdataminingsoftware:anupdate,ACMSIGKDD Ex-plor.Newslett.11(1)(2009)10–18.
[30]B.Ziólko,etal.,Logitboostwekaclassifierspeechsegmentation,in: Proceed-ingsof2008 IEEE International Conferenceon Multimediaand Expo,IEEE, 2008.
[31] B.M.Patil,D.Toshniwal,R.C.Joshi,Predictingburnpatientsurvivabilityusing decisiontreeinwekaenvironment,in:ProceedingsofIEEEInternational Ad-vanceComputingConference,2009.
[32]M.Dash,L.Huan,Featureselectionfor classification,Intell.DataAnal.1(3) (1997)131–156.
[33]T.Garg,S.S.Khurana,Comparisonofclassificationtechniquesforintrusion de-tectiondatasetusingWEKA,in:RecentAdvancesandInnovationsin Engineer-ing,IEEE,2014.
[34]J.R.Quinlan,C4.5:programsformachinelearning,Elsevier,2014.
[35]C.Cortes,V.Vladimir, Support-vectornetworks,Mach. Learn.20 (3)(1995) 273–297.
[36]C.-C.Chang,C.J.Lin,LIBSVM:alibraryforsupportvectormachines,ACMTrans. Intell.Syst.Technol.2(3)(2011)27.
[37]D.W.HosmerJr.,S.Lemeshow,AppliedLogisticRegression,JohnWiley&Sons, 2004.
[38]D.D.Lewis,Naive(Bayes)atforty:theindependenceassumptionin informa-tionretrieval,MachineLearning:ECML-98,Springer,BerlinHeidelberg,1998, pp.4–15.
[39]J.Erman,A.Mahanti,M.Arlitt,Qrp05-4:Internettrafficidentificationusing machinelearning,in:ProceedingsofGlobalTelecommunicationsConference, IEEE,2006.
[40]A.A.Ross,K.Nandakumar,A.Jain,HandbookofMultibiometrics,6,Springer Science&BusinessMedia,2006.