• No results found

FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise

N/A
N/A
Protected

Academic year: 2021

Share "FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Research

paper

FDSTools:

A

software

package

for

analysis

of

massively

parallel

sequencing

data

with

the

ability

to

recognise

and

correct

STR

stutter

and

other

PCR

or

sequencing

noise

Jerry

Hoogenboom

a,b,

*

,1

,

Kristiaan

J.

van

der

Gaag

a,b,1

,

Rick

H.

de

Leeuw

a

,

Titia

Sijen

b

,

Peter

de

Knijff

a

,

Jeroen

F.J.

Laros

a

a

DepartmentofHumanGenetics,LeidenUniversityMedicalCenter,Leiden,2300RC,TheNetherlands

b

DivisionBiologicalTraces,NetherlandsForensicInstitute,LaanvanYpenburg6,2497GB,TheHague,TheNetherlands

ARTICLE INFO

Articlehistory:

Received26September2016

Receivedinrevisedform31October2016 Accepted23November2016

Availableonline27November2016

Keywords: Forensicscience

Nextgenerationsequencing Massivelyparallelsequencing Shorttandemrepeat Powerseq

FDSTools Software

ABSTRACT

Massivelyparallelsequencing(MPS)isontheadventofabroadscaleapplicationinforensicresearchand

casework.Theimprovedcapabilitiestoanalyseevidentiarytracesrepresentingunbalancedmixturesis

oftenmentionedasoneofthemajoradvantagesof thistechnique.However,mostoftheavailable

softwarepackagesthatanalyseforensicshorttandemrepeat(STR)sequencingdataarenotwellsuited

forhighthroughputanalysisofsuchmixedtraces. Thelargestchallengeisthepresenceofstutter

artefactsinSTRamplifications,whicharenotreadilydiscernedfromminorcontributions.FDSToolsisan

open-sourcesoftwaresolutiondevelopedforthispurpose.Thelevelofstutterformationisinfluencedby

variousaspectsofthesequence,suchasthelengthofthelongestuninterruptedstretchoccurringinan

STR.WhenMPSis used,STRsareevaluatedassequencevariantsthateachhave particularstutter

characteristicswhichcanbepreciselydetermined.FDSToolsusesadatabaseofreferencesamplesto

determinestutterandothersystemicPCRorsequencingartefactsforeachindividualallele.Inaddition,

stuttermodelsarecreatedforeachrepeatingelementinordertopredictstutterartefactsforallelesthat

arenotincludedinthereferenceset.Thisinformationissubsequentlyusedtorecogniseandcompensate

forthenoiseinasequenceprofile.Theresultisabetterrepresentationofthetruecompositionofa

sample.UsingPromegaPowerseqTMAutoSystemdatafrom450referencesamplesand31two-person

mixtures,weshowthattheFDSToolscorrectionmoduledecreasesstutterratiosabove20%tobelow3%.

Consequently,muchlowerlevelsofcontributionsinthemixedtracesaredetected.FDSToolscontains

modules tovisualisethedata inaninteractiveformat allowinguserstofilterdata withtheirown

preferredthresholds.

ã2016TheAuthors.PublishedbyElsevierIrelandLtd.ThisisanopenaccessarticleundertheCC

BY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1.Introduction

AnalysisofShortTandemRepeats(STRs)hasbeenasuccessful

forensic tool in the past two decades. The comparison of STR

profiles from forensic DNA evidentiary traces with reference

samplesandDNAdatabaseshasprovidedessentialinformationin many forensic cases [1]. Standard practice is to use Capillary Electrophoresis (CE) to analyse STR length variation. In recent

years,Massively Parallel Sequencing(MPS) wasintroducedasa newmethodtoanalyseSTRsandotherforensicDNAmarkers[2,3].

MPS enables the simultaneous detection of both length and

sequence variation of STRs, which increasesthe discriminatory value substantially [4–6]. The output of CE consists of peaks reflectingfluorescentsignalintensitieswiththeirownrespective shapesandpeakheights.TheoutputofMPSdataanalysisconsists simplyofreadcountsoftheobservedsequences.Bothmethodscan sufferfromtheoccurrenceofPCRartefactssuchasSTRstutters[7]. This especially complicatestheanalysis of STR profilescoming

from multiple contributors, which is common in forensic

evidentiarytraces[8].Thelevelofstutterformationdependson a numberofdistinctaspects ofthesequence,includingtheA/T contentoftherepeatunitandthenumberofconsecutiverepeat units occurring in an STR [9]. Since any specific STR length

* Correspondingauthor.

E-mailaddresses:j.hoogenboom@nfi.minvenj.nl(J.Hoogenboom),

k.van.der.gaag@nfi.minvenj.nl(K.J. vanderGaag),[email protected]

(R.H.deLeeuw),t.sijen@nfi.minvenj.nl(T.Sijen),[email protected](P.deKnijff),

[email protected](J.F.J. Laros).

1

Theseauthorscontributedequally.

http://dx.doi.org/10.1016/j.fsigen.2016.11.007

1872-4973/ã2016TheAuthors.PublishedbyElsevierIrelandLtd.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense( http://creativecommons.org/licenses/by-nc-nd/4.0/).

ContentslistsavailableatScienceDirect

Forensic

Science

International:

Genetics

(2)

identifiedbyCEcanconsistofmultipledifferentsequences,these CE-identifiedlengthvariantsshowalargervariationinmeasured stutterpercentage than individual sequences analysed through MPS.ThisdecreasedvariationinstutterpercentageforMPSSTR datamayaidintheinterpretationofmixtures[2],asitallowsfora betterpredictionofstutterbehaviour,whichcanbeusedtofilter thedataforstutterproducts.Existingsoftwarepackagesforthe analysisofSTRsequencingdata[10–12]donotsupportextensive filteringandcorrectionofsystemicPCRand/orsequencingerrors andthereforeseemlesssuitedforanalysisofmixedDNAsamples. Thispromptedustodevelopasoftwarepackagethatharboursthe followingfeatures:1)characterisationandcorrectionofnoisein thesequencingdatacausedbyPCRstutterorothersystemicPCR and/orsequencingerrors;2)visualisationofsequencingdataas comprehensiveprofiles;3)filteringofdataingraphsandtables withuserdefinablethresholds and4)open-sourceaccessibility. Forensic DNA Sequencing Tools (FDSTools) is available via the PythonPackageIndex(eitherbymanualinstallationorbyusing thecommand‘pipinstallfdstools’).Weassesstheperformanceof

FDSToolson31two-personmixturesgenotypedviathePromega

PowerseqTMAutoSystemforwhichwefirstgeneratedareference datasetof450samples.

2.Materialsandmethods

2.1.Samplepreparation

PCRproductsandsequencinglibrarieswere preparedasdescribed previously[2]usingaprototypePromegaPowerseqTMAutoSystem containing23STRsandamelogenin.Asetof450Dutchsamples[13]

and31two-personmixtureswereamplifiedandsequenced.The

mixturesconsistedofthreecombinationsoftwodonorsselected randomlyfromapoolofunrelatedindividuals,whichweremixedin differentratios.Theminorcomponentsinthemixturescontributed 0.5%(sixmixtures),1%(sixmixtures),5%(fourmixtures),10%(six mixtures),20%(sixmixtures)and50%(threemixtures).

Sincethemixtureswereusedtotesttheperformanceofthe softwareandalsotodetermineanalysisthresholdsthatarefitfor purpose,webalancedtheinfluenceofvaryingDNAinputsinthe

PCR and increased drop-out due to low DNA input. This was

achievedbytheuseofa minimumof theminorcomponentof

60pginthe0.5%,1%and5%mixtures,resultinginatotalDNAinput of12ng,6ngand1.2ng,respectively(60pgresultedinlessthan 20%drop-outinthevalidationofPowerplex6C[14]).Thesame totalDNAinputof1.2ngwasusedforthe5%,10%,20%and50% mixtures(resultingin120pgand240pgoftheminorcomponents inthe10% and20% mixtures,respectively).TheDNAinputwas 0.5ngforsingledonorsamples.

Thegenotypesofthedonorsusedinthemixtureswereknown, which enablestheidentification ofdrop-in and drop-out allele calls.Paired-endsequencingdataofallampliconswasgenerated usingtheMiSeq1Sequencer(Illumina).

2.2.Initialdataprocessing

InFig.1,themaintoolsoftheFDSToolspackageandtheirrolein thedataanalysispipelinearedisplayed.Thetoolscanbesplitinto threefunctionalgroups:toolsforreferencedatabasecreation,tools forreferencedatabasecuration(dataqualityassessment)andtools forcasesamplefilteringanddatainterpretation.Inaddition,the packagecontainsinitialdataprocessingtoolssuchasTSSV[10]that arecommontoreferencedatabasesamplesandcasesamples.

Fig.1.Flowchartoftheanalysisprocess,showingthemaintoolsofFDSTools.Flowchartshowingthemaintools(bluerectangles)oftheFDSToolspackageandtheirrolesin thedataanalysispipeline.TheoutputofeachtoolcanbevisualisedusingtheVistool(notshown).(Forinterpretationofthereferencestocolourinthisfigurelegend,the readerisreferredtothewebversionofthisarticle.)

(3)

2.2.1.Paired-endreadmerging

Using paired-end sequencing, forward and reverse strand

molecules of each amplicon were sequenced from both ends.

Thefirst300nucleotidesfromeitherendwereobtained.These readpairsweremergedintoaconsensusreadbyaligningtheread pair such that the largest possible overlap is obtained while allowingforupto33%mismatchesintheoverlappedregion.Most ampliconswereabout300basepairsinlengthandprovidedfully complementaryreadpairs.

WithSTRampliconsthatarelongerthan300bp,aproblemmay occurwhenbothreadsendinthemiddleoftheSTRstructureand thepairmaybemergedintoatruncatedSTRsequence.Amodified versionofFLASH1.2.11[15](availableviagithub.com/Jerrythafast/ FLASH-lowercase-overhang)wasusedtomarkthebasesthatwere notintheoverlappedregioninlowercaseintheconsensusread. ThisenablesdetectionoftruncatedSTRsequencesindownstream analysis.

2.2.2.Linkingreadstolociandalleles

Themergedreadsarelinkedtospecificlociandallelesbythe TSSVtool,whichisawrapperaroundasimplifiedversionofthe TSSV[10] programcalled TSSV-Lite.TSSV linksreads tolociby scanningthereadsforthesequencesflankingtheSTRlociused. Theflankingsequencesofeachlocus,thatusuallyrepresentthe most50nucleotidesoftheprimers,areprovidedtoFDSToolsina libraryfile,togetherwithvariousotherdetailsaboutthelociused. SupplementaryFile1representsthelibraryfileusedinthisstudy. Thefilecontainsadescriptionforthecontentsofeachsection.

Eachreadisscannedfortheseflankingsequencesbycomputing

alignments. In this study, the flanking sequences were 18

nucleotidesinlengthand twosubstitutions(ortwoinsertedor deletedbases)perflankwereallowedinthealignment.Readsare categorised as ‘unrecognised’ if no flankingsequence is found. Furthermore,bothflankingsequencesarerequiredtohaveatleast oneuppercaseletter,whichensuresthatoverlappedreadsthatare potentially truncatedare categorised as ‘unrecognised’ as well. Readsinwhichonlyoneflankingsequenceisfoundwithatleast oneuppercaselettergetlinkedtoalocusbutflaggedas‘nostart’or ‘noend’dependingonwhethertheleftorrightflankismissing, respectively(optionally, thesereads canbewritten toseparate fastaorfastqfiles).

ThemainoutputofTSSVisatextfilewithtab-separatedvalues. Thefilecontainsonelineforeveryuniquesequenceofeachlocus. Thecolumnsincludethenameofthelocus,thesequence,andthe

numberofreadscarryingthisparticularsequence.Readcountsare givenseparatelyfortheforwardandreversestrand.

TSSVincludesadditionaloptionsforfilteringsequencesthatare seentoofewtimesandsequenceswithalengthoutsideagiven range(e.g.,primer-dimers).Thisrangecanbespecifiedseparately foreachlocus.Furthermore,filteredsequencescanbeaggregated into a single ‘othersequences’ category for each locus. In this project,onlysingletons(i.e.,sequenceswithonlyoneread)were aggregatedtothe‘othersequences’category.

2.3.Buildingareferencedatabase

OnefunctionofFDSToolsisthebuildingofareferencedatabase. Suchadatabasecanbeusedtoobtainestimatesofrecurring allele-specific systemic noise. Here, ‘noise’ refers to the complete collectionofsequencesobservedinasample,exceptthesample’s trueallelicsequences.Noiseincludesanyartefactderivingfrom thePCRaswellasthesequencing(suchasPCRstutteror single-nucleotide errors). Additionally, based on the reference data a statisticalmodelcanbederivedthataimstopredictstutterratios forallelesnotpresentinthereferenceset.

The creation of a reference database involves various tools includedintheFDSToolspackage,whichwillbediscussedinthe nextsections.Inadditiontotheseseparatetools,FDSToolsoffers the Pipeline tool,which convenientlyintegrates theentire data analysispipeline.UsersareadvisedtousePipelineasitremovesthe complexityofhavingtorunseveralseparatetoolsandtocombine theiroutput.Pipelinetakesasimpleconfigurationfilecontaining theanalysis parametersand automaticallyrunstheappropriate tools.

Buildingareferencedatabaseisatwo-phaseprocess.Inthefirst phase,thereferencesamplesareanalysedinaglobalmannerto identifytheirallelesandrejectthosesamplesinwhichthealleles arenotreadilyidentified.Inthesecondphase,thesystemicnoise ofeachoftheseallelesisanalysedindetail.

2.3.1.Allelecallingforreferencesamples

Determiningtheallelesofsingledonorreferencesamplesisa fairlystraightforwardprocessbecausethesegenerallyrepresent theoneortwomostabundantsequencesforanylocus.FDSTools includes Allelefinder to call alleles this way. It is applied after Stuttermark,whichisdescribedbelow.Anumberofthresholdsare used to guard against including alleles of potential low-level contaminations,whichareoutlinedinFig.2.Forheterozygousloci,

Fig.2.ThresholdsusedbyAllelefindertocallallelesinreferencesamples.Sequencevariantswithareadcountabovetheallelethresholdarecalledasalleles.Thefour lighter-shadedbarsrepresentstuttervariants(asrecognisedbyStuttermark),whichareignoredbyAllelefinder.

(4)

asecondalleleisonlycalledifitpassestheallelethreshold,which isdefined inthis project as 30%of theread countofthemost frequentalleleatthesamelocus.Asweexpectnostutterabove 30%[2],thisthresholdseparatestheallelesfromnoise.Noalleles arecalledatalocuswhenadditionalsequencesoccurthathavea

read count below the allele threshold but above the noise

threshold(whichisdefinedas15%ofthemostfrequentallelein thisproject)orifathirdsequencepassestheallelethreshold.If morethantwolociinthesamesamplefailtogivearesultforthese reasons,theoverallqualityofthesampleisconsideredtoopoorto reportanyalleles.Additionally,Allelefindercanbeconfiguredtocall atmostonealleleathaploidloci.

The three potential pitfalls are 1) PCR stutter artefacts that exceedthenoise threshold;2)strongread countimbalancefor heterozygousalleles,whichmaybetheresultofe.g.,primer-site sequencevariantsand3)autosomaltrisomy,whichisrare.Todeal

with the problem of stutter, each sample was analysed with

Stuttermark[2]beforecallingalleles.WithStuttermark,sequences thatareinastutterpositionofanothersequencewhilehavinga readcountbelowauser-suppliedpercentagewithrespecttothe othersequencearemarkedas‘stutter’.Sequencesthathavearead countthatistoohightobeexplainedbystutteralonewillnotbe markedas‘stutter’,astheymaycoincidewithagenuineallele.The thresholdsusedherewere30%for1stutter(lossofarepeatunit) and 10% for +1 stutter (gain of a repeat unit). For 2 stutter

products, a 30% threshold of the 1 stutter product is used.

Sequencesthataremarkedas‘stutter’arecompletelyignoredby Allelefinder.

Allelefinderproducesthelistofallelesandareportdetailingfor which samplesand loci allelecallingis rejected and for which reasons.

2.3.2.Estimatingaverageallele-specificsystemicnoise

Foreachallele,aprofileofrecurringsystemicnoise,including PCRstutterproductsaswellasanyother‘sideproducts’,canbe generatedbasedonthereferencedata.Noiseprofilesarealways computedseparatelyforforwardandreversereads,becausestrand biasmayexistinthesequencingtechnologyused.Profilesarealso computedseparatelyforeach locus,undertheassumptionthat noiseproductionisnotinfluencedbyallelesofotherloci.Thelevel ofnoiseisexpressedasthenumberofnoisereadsasapercentage ofthenumberofreadsoftheparentallele.InthecontextofPCR stutteranalysis, thisquantityis oftenreferredtoasthe‘stutter ratio’, despitetherepresentation as a percentageof the parent allele.Weusethegeneralisedterm‘noiseratio’(alsorepresented as a percentage of the parent allele) to account for all other systemicnoiseaswell.

Noiseratio¼NoiseAllelereadsreads100

In homozygoussamples,thenoiseratiocanbecalculatedby dividingthe number of reads of a non-allelic sequence bythe number of reads of the allele.Allele-specific noise profilesare readilycomputedfromhomozygoussamplescarryingthisalleleby scalingthereadcountsineachsamplesuchthattheparentalleleis 100andaveragingthenoiseratiosforeachnoisesequence.These per-allelenoisestatisticsandotherstatistics,suchasthestandard deviationsofthenoiseratioscanbeobtainedusingBGHomStats.In heterozygoussamplestheextractionofnoisesequencesismore complex,becauseithastobedeterminedwhichproportionseach allelecontributedtotheobservednoise sequences.Weassume thatnoiseinheterozygoussamplescorrespondstothesumofthe noiseprofilesofthetwoalleles,aftertheapplicationofascaling correctiontoaccountfordifferencesintheamountofeachallele amplified.Thisisneededasevenforheterozygousallelepairs,PCR efficiencymayvaryduetoprimerbindingsitesequencevariation

orSTRlength[16].Toextractnoisefromheterozygousreference samplesaniterativeapproachwastakenandimplementedinthe BGEstimatetoolinFDSTools.

Inessence,thealgorithm,whichisdiscussedinmoredetailin SupplementaryText1,seeksanon-negativeleastsquaressolution tothematrixequationAP=C.Inthisequation,CisanNMmatrix ofconstantsderivedfromthereadcountsinthereferencesamples, AisanNNmatrixsummarisingtheallelebalanceinthesamples, and P is an NM matrix containing the estimated profilesof systemicnoise.Nisthenumberofuniquegenuineallelesamong

the reference samples and thus also the number of profiles

producedandMisthetotalnumberofuniquesequencesobserved. MatrixCiscomputedonceatthestartofthealgorithm.Each rowinCcorrespondstoonealleleandcontainsthesumoftheread countsofallsamplesthathavethatparticularallele,afterscaling thealleleto100readsforhomozygoussamplesand50readsfor heterozygotes. The noise profiles in P are initialised with the assumptionthatnosystemicnoiseispresent,i.e.,allelementsare setto0, except for theelementsthat correspond totheactual alleles,whicharesetto100.

Thealgorithmthenproceeds byrepeatedlyre-estimatingthe allelebalancematrixAwhilereadingcross-contributionsbetween

the alleles from the current profiles P and subsequently

re-estimating P by finding a non-negative least squares optimal

solutiontoAP=C.ThevaluesthusobtainedinParetheaverage noiseratiosofallobservedsystemicnoiseforallalleles(i.e.,each rowinPcontainsthenoiseprofileofoneallele).

Toavoidnoisefromoneallelebeingincorporatedinthenoise profileofanotherallele,aminimumofthreedifferent heterozy-gousgenotypesperallelewasusedinthisstudy.Athresholdcanbe setfortheminimalreadcountofnoisetoconsiderandtheminimal

percentage (weused 80%)of reference sampleswiththe same

allelewhichshouldcontainthesamenoisebeforeitisincludedin thenoiseprofile.Eachoftheseparameterscanbesetusingvarious optionsofthe‘fdstoolsbgestimate’command.

2.3.3.Relatingtheamountofstuttertorepeatlength

Withthemethodsoutlined above,profilesofsystemicnoise wereobtainedforeachallelepresentinthereferenceset.However, one would also like to be able to filter and correct the noise originatingfromallelesthatarenot(yet)includedinthereference set,ascasesamplesmaybeencounteredthatcontainallelesfor which noreferencesample was available. For this purpose, we

developedamethodtopredictthesequenceandcorresponding

amountofPCRstutterartefactsthatwouldbeproducedforany alleleofagivenlocus.Notethatthismethoddoesnotpredictnoise otherthannoiseresultingfromSTRstutterorsinglenucleotide stretches.

Previous studies have shown that the amount of stutter is

strongly correlated with the length of the repeated sequence

[17]andeven more sowiththenumberof consecutive repeat

units [2,18]. The FDSTools tool Stuttermodel seeks to fit

polynomial functions to the repeat length and stutter ratio in

homozygous referencesamples. Stuttermodel scans each of the

alleles forall positionswhere a particularrepeatunit (e.g.,the

sequence ‘AGAT’) is repeated and records the length of this

repeat, as the number of nucleotides, including incomplete

repeatsatthebeginningorendoftherepeatedstretch.Foreach

sample with this allele, the number of noise reads that lack

exactly one repeat is counted. Reads that combinethe loss of one repeat with one or more other differences (e.g., substitu-tions,orstutterinanotherstretchofrepeatsinthesameallele) areincludedinthiscount.Thecountsthusobtainedareusedto

compute the noise ratios of individual stutter sites and a

polynomial function is fitted to quantify the relationship

(5)

Thisanalysisisrepeatedforeachuniquerepeatunitofalength

betweenoneandaconfigurablemaximumnumberofnucleotides

(inclusive), treating cyclicallyequivalent units (e.g., ‘ATAG’ and ‘AGAT’)andtheirrespectivereversecomplements(e.g.,‘CTAT’and ‘ATCT’)synonymously.Theamountof+1stutter,2stutteretc.is analysedthesameway.

Becausedifferentlocibehavedifferentinstutterformation,a separatefunctionisfittedforeachlocus.Additionally,apolynomial functionisfittedtoalldataatonce,whichisusedtopredictstutter inallelesoflociforwhichinsufficientreferencedatawasavailable tofitalocus-specificfunction.Separatefunctionsarefittedforthe forwardandreversestrands.

Foreachfittedfunction,Stuttermodelalsodeterminesthelower boundoftherepeatlengthforwhichthefunctiongivesmeaningful results.Thislowerboundisdefinedasthelowestrepeatlengthfor whichthefunctionproducesanonnegativeresultandthefunction isnon-decreasing.Belowthisthreshold,andinanyotherpoints wherethefunctionvaluewouldbenegative,thefunctionvalueis settozero.

Thequalityoffitis assessedbycomputing thecoefficientof determination, R2¼1

S

iðyiy^iÞ 2

S

iðyiyÞ 2 where y^i¼ f0i ffi0 i<0 

withyithenoiseratiosofthereferencesamples,ythemean,fithe polynomialfunction’sestimateofthenoiseratioofsamplei,and^y themodifiedfunctionvalue.TheR2scorewillbeclosetoonewhen thefunctionisagoodfitandlowerotherwise.

Stuttermodel supports fitting polynomial functions of any

degree.To preventover-fittingwhile stillallowinga non-linear

relationship, second-degree polynomials (with a minimum R2

score)wereused.IncaseswherethefitforonestrandhasanR2 scoreabovethethresholdwhilethefitfortheotherstrandscores belowthethreshold,bothfitsarerejectedtopreventunintended introductionofstrandbiasbyfilteringstutterononlyonestrand. 2.3.4.Curatingthereferencedatabase

Tomakesureallreferencesampleswereofgoodqualityandall alleleswere called correctly, they wereput through the same

analysis pipeline as case samples, thereby performing noise

filteringandcorrectiononthereferencesamples.Itisimportant tonotethatthesereferencesampleswerepreviouslygenotyped byus in greatdetailusing CE [13].The remaining amountsof noisein eachsamplewereassessedusingBGAnalyse(described below)toidentifypotentiallyunsuitable referencesamplesthat still passed the thresholds of Allelefinder. Any sample with a

notablyhigher amountof remainingbackgroundwas manually

removedfromthesetofreferencesamplestopreventpollutionof thenoiseprofiles.

BGAnalyse was developed and employed to analyse the

remainingnoiseaftercorrection.Foreachlocusandeachsample, thistoolcalculatestheleastfrequent(thiscanbeanegativevalue becauseofover-correction),mostfrequent,and totalnoise asa percentageofthenumberof readsofthehighestalleleateach locus.Theseresultsaresubsequentlyvisualisedtoeasilyidentify potentiallyproblematicsamples.Inthevisualisation,samplescan besorted byany of thecalculated values orbycoverage (total numberofreads).Samplesweresubjectedtomanualinspection andanysamplethatexhibitednon-stutterproductswithcorrected readcountsabove4%ofthemostfrequentalleleorabove2%ofthe totalreadswasrejected.

2.4.Analysingcasesamples

Theanalysisofmockcasesampleswasperformedina three-stepprocesswhichisdescribedinthefollowingsections.

1. A prediction was made for the amount of stutter for each

sequenceinthesample,usingthefittedpolynomialfunctions obtainedfromrunningStuttermodelonthereferencesamples. Thesepredictionsareusedtoextendtheallele-specificnoise profiles obtained from running BGEstimate on the reference samples.

2. Theextractednoiseprofilesareusedtofilterandcorrectthe noiseinthecasesample.

3. Alleles are called and the sample is subjected to manual

interpretation.

Similartothecreationofareferencedatabase,analysingcase

samples involves multiple tools discussed in the following

sections.Pipelineoffersaconvenientwaytoautomaticallyanalyse acasesamplewithalltoolsdiscussed.

2.4.1.Predictingstutteramountsforunknownalleles

Becausecasesamplesmaycontainallelesthatarenotpresentin thereferencesamples,noiseprofilesfortheseallelesneedtobe predicted. FDSTools includes the BGPredict tool, which uses a previously created Stuttermodel file to predict the amounts of stutter artefacts for alleles not present in the reference data. BGPredictfindsallsequencesintheanalysedcasesampleinwhich aparticularrepeatunitisrepeated.Theexpectedamountofstutter in this repeatis then computed usingthe correspondingfitted

polynomial function from the Stuttermodel file. All possible

combinations of stutter aretaken into consideration when the

frequencies of each stutter artefact are computed. The noise

profilescreatedinthiswayareusedtoextendthenoiseprofilesin thepreviouslycreated BGEstimate file(atool calledBGMerge is includedinFDSToolsforthispurpose).

2.4.2.Noisefilteringandcorrectionincasesamples

To beabletofiltersystemicnoise in casesamples,one first needstodeterminewhichallelesarelikelypresentinthesample. Tothisend,thealgorithmofBGEstimateisessentiallyreversed,i.e., thegoalisnowtosolveforainaP=c,wherecisarowvectorwith thesample’sreadcountsfortheMsequencesinthenoiseprofiles andaisarowvectorwiththeestimatedamountofeachoftheN profilesin the sample. P is the NM matrix of noise profiles obtainedfromBGEstimate,extendedwiththepredictionsobtained fromBGPredict.Solvingforaisdoneinanon-negativeleastsquares senseasbefore,givingestimatedallelecontributionsthatbestfit thevarioussequences–allelesaswellasnoise–presentinthe sample.

Background-correctedread countscanthen becomputedby

firstsubtractingthescaledprofilesfromthesample’sreadcounts

d caP

and then adding the total size of each profile to the

correspondingallele,i.e., dn dnþan

S

M

m¼1Pn;m; 8n2½1:::N

Notethatdmayhavenegativeelementsifthesamplecontainsa

loweramountof acertain sequencethanwas predictedbythe

profilesofitsdominantalleles.

FDSToolsoffersBGCorrecttofilterandcorrectbackgroundnoise followingtheprocedureoutlinedabove.Givenasampledatafile

(obtained from TSSV for example) and a file containing noise

(6)

additional columns giving the amounts of each sequence attributedtonoiseandtheamountsofeachsequencethatwould berecovered by noise correction (i.e., adding the noise tothe originating allele). These values are given separately for the

forward and reverse strand. Although the method by which

BGCorrectcomputes them resultsin non-integer values, it was decidednottoroundthesenumberstoavoidunnecessarylossof precision.Ifnecessary,thesenumberscanberoundedtointeger values,thereby easingtheinterpretation as‘read counts’ when presentedinagraphortableinareport.

2.4.3.Allelecallingforcasesamples

ThenaïvemethodofcallingallelesthatAllelefinderusesisnot appropriateforcasesamples,sincethesemaycontainallelesof multiple contributors in different quantities. Therefore, calling alleles in case samplesis done bycomputing various statistics

based on the information of the detected sequences and

subsequentlysettinginterpretationthresholdsonthesestatistics. Forthis,Samplestatswasdeveloped,whichoperatesonandadds variouscolumnstotheoutputofBGCorrect.Samplestats automati-callymarkssequencesas‘allele’usingthethresholdsoutlinedin Table1.

Alleles canalso becalledwhile visualisingthe sampledata, hence,FDSToolsincludestheSamplevisvisualisation.Bymeansof theinteractivegraphicaluserinterfaceofSamplevis,thesamesetof thresholdsasdepictedinTable1areavailabletofilterthevisible sequences and to automatically call alleles. Thresholds can be specifiedseparatelyforthegraphsandforthetables.Whilethe tabledisplaysthecalledalleles,lessconservativesettingsmaybe usedforthefilteringofthecorrespondinggraphtoensurevisibility of allelesjust below theallele-callingthreshold. The results of

changing the thresholds are immediately visible. Clicking a

sequenceinanyofthegraphstogglesits‘allele’status.Thisallows theusertomanuallyaddallelestoandremoveallelesfromthe profile.Anoteisaddedtomanuallyaddedalleles,statingthatthe alleleis‘User-added’.Similarly,iftheuserremovesanyalleles,the alleleremainsvisiblebuta‘User-removed’noteisadded.Inthis

wayitremainseasytotracebackexactlywhichallelesmeetthe

thresholdsandwhichonesweremanuallyaddedandremoved.

Samplestatscanalsobeusedtofiltersequencesusingthesame typesofthresholds(albeitwithmorestringentthresholdvalues than used for allele calling, as potential alleles should not be filteredout)and(optionally)aggregatethefilteredsequencesper locustoasinglelinecategorised‘othersequences’.

2.5.Visualisation

For visualisation of the data, FDSTools makes use of the

JavaScript graphing library Vega [19]. Vega graphs can be

embedded ona web page, exposing a JavaScript programming

interfacethatallowsforupdatingthegraphsbasedontheuser’s interactionwiththewebpage.VegacanalsorunonNode.js,which allowsittobeincludedinautomatedanalysispipelinestogenerate (static)imagefiles.

FDSToolscomeswithVegagraphspecificationsand accompa-nyinginteractivewebpages(HTMLfiles)tovisualisetheoutputof eachtool.TheVistoolcanbeusedtoobtainself-containedHTML

files containing visualisations of various types of data files

generatedbytheothertools.Forexample,Samplevisvisualisesa sample data file as a sequence profile and Profilevis visualises backgroundnoiseprofilesobtainedfromBGEstimateorBGPredict. AdescriptionofeachvisualisationcanbefoundinSupplementary

Table1.Whenviewedinawebbrowser,thewebpageprovides

additionalcontrolsthatallowtheusertofilterthedata,switch betweenlinearandlogarithmicscales,orselectdifferentsubsetsof thedatatovisualise.Thedefaultvaluesforthesettingsontheweb pagecanbesetwhentheHTMLfileisgeneratedbytheVistool. Thewebpagesalsooffertheoptiontosavethedisplayedgraphs asaScalableVectorGraphics(SVG)orrasterisedPortableNetwork

Graphics (PNG) image, so that they can be imported into

documents. Alternatively, the Vis tool can supply a raw Vega

graphspecificationfile(eitherwithorwithoutembeddeddata),

whichcanthenbeusedbyVegatogenerateSVGorPNGimages

directlyonthecommandline.

Table1

InterpretationthresholdsforcasesamplesinSamplestatsandSamplevis.Sequencesthatmeeteitherthe‘Percentagecorrection’or‘Percentagerecovery’threshold(orboth)as wellasalltheotherthresholdswillbemarkedas‘allele’.Thesethresholdvaluesareevaluatedafternoisecorrection.The‘Allelecallingdefault’columnliststhedefault thresholdvaluesforcallingalleles.The‘Filteringdefault’columnliststhedefaultvaluesusedforfilteringdisplayedsequencesinSamplevisgraphs.

Threshold Description Allele

calling default

Filtering default

Totalreads Minimumnumberofreadsperallele.Non-systemic(andthusunfilterable)sequenceerrorsoccursporadically.Thisthreshold ensuresthataminimalamountofamplifiedproductispresenttosupporttheallelecall.

30 5

Readsper strand

Minimumnumberofreadsperalleleforbothstrands.Thisthresholdcanbeusedtoexcludelowtemplatesequenceswithstrong strandbias.

1 0

Percentage ofmost frequent

Thenumberofreadsasapercentageofthenumberofreadsofthemostfrequentalleleatthelocus.Thisthresholdsetsalimittothe mixtureproportionsthatcanbeanalysedinmixedsamplesortotheallelebalanceinsampleswithasinglecontributor.

2% 0.5%

Percentage oflocus

Eachallelecontributesatleastthispercentagetothetotalnumberofreadsofthelocus.Withthisthreshold,aminimumcontribution percentagecanbeenforced.

1.5% 0%

Percentage correction

Thispercentagederivesfromthenumberofreadsafternoisecorrectionminusthenumberofreadsbeforecorrection,whichis dividedbythenumberofreadsbeforecorrection.Consequently,thepercentagecorrectionisnegativeifnoisecorrectionresultedina reductionofthereadcountofasequence.Therefore,withthisthresholdsetto0%,anysequencerepresentingnoisewillnotbecalled asanallele.

Tobeabletodetectallelesofminorcontributorsthatcoincidewithnoiseproductsforthemajorcontributor’salleles,the‘percentage recovery’thresholddescribedbelowisallowedtooverrulethisthreshold.

0% 0%

Percentage recovery

Thenumberofreadsaddedbynoisecorrectionasapercentageofthetotalnumberofreadsafternoisecorrection.Afternoise correction,atleastthispercentageofreadsmusthaveoriginatedfromcorrectednoise.Therationalebehindthisthresholdisthat onlyallelicsequenceswillhavesubstantialamountsofrecoveredreads.Whenanalleleofaminorcontributioncoincideswiththe stutterofanalleleofthemajorcontributor,noisewillbeextractedandaddedtothemajorcontributor’sparentalleleresultingina negativepercentagecorrection.Yet,sincetheminorcontributor’scontributiontothereadsalsoresultsinnoiseproductsthatare corrected,theallelewillreceiverecoveredreadsandapercentagerecovery>0%.Toallowthecallingofallelesforwhichnonoise profileexists(ornonoisewasdetected)inthereferencedatabasethethresholdissetat0%bydefault.

(7)

3.Resultsanddiscussion

WedevelopedFDSTools,asoftwarepackagecontainingasuite oftoolsthatcanbeusedfortheanalysisofforensicMPSdata.With thesetools,FDSToolsprovidesdetailedinsightinthequalityofa sample and the noise profile of a certain allele (or sequence

variant). In Supplementary Table 1, an overview of all tools

currentlyavailableinthepackageisprovided,ofwhichaselection wasdescribedinmoredetailsinSection2.

Toenhancetheanalysisofmixedsamples,FDSToolsidentifies, extractsandcorrectsforPCRorsequencingnoisesuchasstutter fromareferencedatabasewiththeaimtodiscernlow mixture proportions. Different STR amplification assays and different amplification protocols could result in different noise. It is thereforeimportanttobasethedatabasefornoisecorrectionon referencesgenerated bya methodthat is representablefor the caseworksamplestobeanalysed.

Notethatitisnotpossibletocorrectallnoisecompletelyasthe levelofnoiseshowsvariationbetweensamples.

3.1.Referencedatabase

Our reference samples were sequenced with an average

coverageof65,000readsandamodeofabout45,000reads.For thepresentstudy,aminimumcoverageof6000readspersample wasrequired,whichrelatestoanaverageof250readsperlocusas 24 loci wereco-amplified.For heterozygousloci, less than 250 readsperlocusisnotsufficienttoquantifylowamountsofnoise accurately.

3.1.1.Referencesamplecuration

Sincethereferencedatabaseisusedtofilterandcorrectnoisein casesamples,itisessentialthatthereferencesamplescontainno contaminantsandreferenceallelesarecalledcorrectly.Although all other stepscan be performed automatically by FDSTools, a manualcurationofsamplesinthereferencedatabaseisneeded. BGAnalysewasdevelopedtofacilitatethisprocessbyvisualising potentialoutliers.

Allelefinder automatically rejected two out of the initial450 sampleswhichwereclearlycontaminatedandthreesamplesthat hadtoolowcoveragetodetectallelesreliably.Manualinspection ofsampleswithanotablyhigheramountofremainingnoiseafter correctioninBGAnalyseresultedintherejectionofanadditional16 samples.Reasonsforrejectionwerelow-levelcontamination,low coverageandlow sequencingquality.Theinteractive BGAnalyse visualisations displaying the remaining noise for the reference samplesareavailableinSupplementaryFile2a(beforedatabase curation)and2b(aftercuration).Forthemajorityofsamples,the highestremainingnoise variantin thecompleteprofile didnot exceed3%ofthenumberofreadsofthehighestalleleatthelocus whilewithoutcorrectionSTRstutterscanrepresentover20%.For theremaining429samples,nodrop-inordrop-outwasobserved whencallingallelesusingAllelefinderwiththesettingsdescribedin Section2.3.1.

3.1.2.Extendingnoiseprofilesfornoisecorrection

AsdescribedinSection2.4.1,casesamplesmaycontainalleles which arenotpresent inthereferencedatabase.In suchcases, FDSToolsresortstonoisepredictioninsteadofnoiseestimation.A columnintheoutputfileofBGCorrectmarksifcorrectionhasbeen performedusingdataobtainedfromBGEstimate(iftheallelewas availableinthereferencedatabase)orbyusingBGPredict(ifnot availableinthereferencedatabase).

FromtheresultsfromStuttermodelitbecomesevidentthatfor simpleSTRsconsistingofasinglerepeatingelementorforlong stretchesofaspecificrepeatingelementwithina complexSTR,

only few reference samplesare needed toreliably fit a stutter

model. However, when complex repeats consist of several

repeating elementsof which someshowlittle lengthvariation, correctionusingthestuttermodelissuboptimalasexemplifiedby thepredictionsforD12S391.ThisSTRlocusconsistsoftworepeat units; an AGAT repeatstretchof highly variablelength and an ACAGrepeatthatis repeated6 to8timesfor mostindividuals. Since Stuttermodel predictsthe amountofstutterbased onthe repeat length,at leastfour differentrepeatlengths need tobe availableinhomozygousreferencesamplestoobtainareliablefit. However, the setof reference samples used in this study only containedhomozygoteswith6to8repeatsofACAG,whichisnot sufficientlyvariabletoobtainareliablefit.Consequently,ACAGis

omitted from thestutter modelfor D12S391, even thoughthis

repeatstuttersupto9%forthelongerrepeats(8repeatunits,data notshown).WhenBGEstimatedoesnotobtainabackgroundnoise profile,BGPredictwillnotcorrectstutterinthisrepeatandthus stutterswillremainpresent.Asalastresort,BGPredictoffersthe possibilitytouseastuttermodelbasedondatafromalllocithat have the same repeat unit sequence if no locus-specific fit is available.SupplementaryFig. 1displaysthestuttermodelobtained fromthesetof429referencesamples,includingtheindividual observationsonwhichthemodelwasbased.

Combining BGEstimate and BGPredict (by using BGMerge)

insteadofusingBGPredictaloneisexpectedtoreducethenoise

remaining after correction, as the combined correction also

corrects for noise other than stutters. This is confirmed when

we determine the percentage of remaining noise (the reads

representingremainingnoiseasapercentageofthereadsforthe mostfrequentalleleatthelocus)andplotthehighestpercentage

and various percentiles (90th, 95th and 99th) (Supplementary

Fig.2a,b).The percentilesillustrate howoften samplesexhibit outlyingnoisesequencevariantsandwhenthe99thpercentileis regarded,BGPredictalone retainsonaverage2.6%noise andthe combinedcorrection2.4%.Also,thecombinedcorrectionresultsin lessovercorrectedvariants.

Thus,BGPredictcanbeusedwithoutBGEstimatewithaslightly reducedaccuracyincorrection.NotethatBGEstimateshouldnotbe usedwithoutBGPredictsinceallelesnotincludedinthereference databasewillnotbecorrected,whichcanresultinacombination ofcorrectedanduncorrectedallelesandremainingnoiseforthe uncorrectedalleles.

3.1.3.Referencedatabasesizeandcoverage

Totesttheeffectofthesamplesizeandtypefromwhichthe

reference database is built, we used the complete curated

reference database of 429 samples and a random selection of

100 samples (both with combined BGEstimate and BGPredict

correction,whichwasfoundtobeslightlybetterasdescribedin Section3.1.2).SupplementaryFig.2c,ddisplayanoverviewofthe mostfrequentandthetotalremainingnoiseateachlocusafter correction.Thedifferentpercentilesofthereferencesamplesare

given to illustrate how often samples exhibit outlying noise

sequencevariants.

Whencomparingtheresultsforthecompletedatabasewiththe resultsforthesubsetof100samples,thedifferenceinremaining

noise seems surprisingly small (Supplementary Fig. 2c, d).

However,withasmallerdatabase,lessalleleswillfitthecriteria tocreateaBGEstimatenoiseprofileandmoreallelesrelyonnoise prediction by BGPredict. Indeed, for the reference set of 429 samples,only3.5%oftheallelesarecorrectedusingBGPredict.This percentageincreasesto10.2%whenthecorrectionisbasedonthe subsetof100samples.

In alargerreferencedatabasemorealleleswillbeobserved. SupplementaryFig.3displaystheallelesobservedinthereference databasesof429and 100samples.Tofitthecriteriatocreatea

(8)

BGEstimate noise profile, alleles need to be present as a homozygousgenotypeorbeavailableaspartofsharedgenotypes withatleastthreeotherallelesthatmustalsofitthesecriteria.For thestuttermodel,onlythehomozygousgenotypesareused.Inthe larger429 database, more alleles fit these criteria than in the smaller100samplesetdatabase.

Toexaminetheeffectofreadcoverageofthereferencesamples onnoise profileanalysis, wegeneratedtwo subsets comprising sampleswithhighorlowcoverage,whichisspecifiedasatotal

read count between 82,000 and 350,000 or 8000 and 44,000

respectively.Thehighcoveragesetcomprised71samples;thelow coverageset70.Wenoticedthatinthelow-coveragenoiseprofiles, strandbiascanoccurespeciallyforthelow-percentagenoisethat isduetosingle-stranddrop-outofthisnoise.Thisisillustratedby theBGEstimatenoiseprofilesfortheCE10_TCTA[10]_-20T>Aallele forlocusD7S820inSupplementaryFig.4,inwhichforwardand reversereadsareingoodorreasonablebalanceforallsevennoise sequencesinthehighcoveragesamplesetwhilegoodbalanceis onlyseenforthetwomainnoisesequencesinthelowcoverageset. Sincethemostabundantnoiseaftercorrectioninasampleis usuallyintherangeof0.5–3%(forSTRanalysis),werecommenda coverageofatleast1000readsperlocus(whichrelatestoa24,000 totalreadcoverageforour24lociamplificationkit)forthesamples

of the reference database to obtain the most accurate noise

estimates.

3.1.4.Infrequentalleles

Depending on the composition of the reference database,

occasionallyalleleswillbeencounteredthatarenotincludedinthe database.BGPredict canpredict thenoise from stutterorother repeatingelementsbutcorrectionofothertypesofnoise(likelow levelSNPs causedby sequenceerrors)is not possibleforthese infrequentalleles.

WethereforerecommendtoobtainBGEstimatenoise profiles for asmany allelesas possible,while retaininggood qualityof thesenoiseprofiles.Severalfilteringcriteriacanbeapplied,suchas

the minimumnumber of different heterozygousgenotypes per

allele, the minimum number of samples per allele and the

minimumnumberofhomozygoussamplesperallele.Theeffectof increasingthestringencyonthefilteringcriteriaonthenumberof retrieved BGEstimatenoise profilesfor our429 referencesetis showninSupplementaryTable2.Thesettingsselectedforusein thisstudyare:atleasttwosamplesperallele(whichensuresnoise isnotbasedonasinglesampleasthatcouldbeanoutlier)that present at least three different heterozygous or at least one homozygote genotype (i.e., the samples can be three different

heterozygotesortwohomozygotesor onehomozygoteandone

heterozygote).

Whenanalleleat aheterozygouslocusfailsthecriteria,the completelocuscarrying thisallelecannotbeusedfor establish-mentofanoiseprofilesincethenoisecannotbeattributedtoany ofthetwoalleles.Thus,forbothallelesataheterozygouslocusno noiseprofileisextracted.

3.1.5.Accuracyofnoisereferencedatabaseandstuttermodel Toverifytheaccuracyofthenoiseprofilesobtainedthrough BGEstimateandBGPredict,itcanbeusefultocomparetheaverage noiseratioswiththenoiseratiosobservedinindividual homozy-gous samples. The noise ratios of all noise in all homozygous

referencesamplescan easilybecollectedusingtheBGHomRaw

tool.Thesedatapointscanbeplottedontopofanoiseprofileto inspecttheconsistencyandvariationinthenoiseratiosofvarious typesofnoiseforeachallele.InFig.3,thenoiseprofileofthemost frequentalleleof D7S820(CE10_TCTA[10]_-20T>A)is displayed,

which has foremost a 1 stutter (CE10_TCTA[9]_-20T>A) in

addition to a 1 nt slippage product at the A-stretch

Fig.3.NoiseprofileofD7S820alleleCE10_TCTA[10]_-20T>A.Thenoiseratioisshownforeachsystemicnoisesequenceobservedwithanoiseratioof0.1%orhigher. Individualobservationsinhomozygoussamples(above0.5%)aredisplayedascircles.Asexpected,themostfrequentlyobservednoisesequenceisthe1stutter,butsincethe allelecontainsasingle-nucleotidestretchof9Anucleotides,aconsiderableportionofthenoiseconsistsofsequenceswithslippageatthisA-stretch(oracombinationofthe two).

(9)

(CE9.3_TCTA[10]_-20T>-). The individual observations for the

homozygoussamples coincide nicely withthe estimated noise

profileratios.

Similarly, it is useful to compare the functions fitted by

Stuttermodel to the data points to which they were fitted.

Stuttermodelincludesanoptiontowritetherawdatapointstoa separateoutputfile,whichcanbevisualisedtogetherwiththefitted modelasshowninFig.4forD7S820.Thisexampleshowsthatthe

homozygouscallsandtheStuttermodelestimationfollowthesame trendandthatthereisnodiscrepancybetweenforwardandreverse reads.ThesameholdsfortheA-stretch(datanotshown).

In thestuttermodel, fits withanR2 score below0.75 were rejected.AlthoughthismayseemaverylowR2score,weobtained betterresultsbyincludingmorefitsthanbyexcludingthem,which wouldresultintheinabilityofthestuttermodeltobeusedtofilter andcorrectstutterfortherespectiverepeatunitsatall.

Fig.4.Stuttermodelforthe1stutterofD7S820.Onthex-axisthelengthoftherepeatisdisplayed(innucleotides)andonthey-axisthe1stutternoiseratio(aspercentage ofreadsoftheparentallele)isdisplayed.Eachhomozygousreferencesampleisdisplayedasadotandthelinesdisplaythefittedfunctionsusedforcalculatingtheexpected stutterofeachallele.

Fig.5.Sequenceprofileofasingle-sourcesample.SequenceprofileoflociD18S51andD19S433ofasingle-sourcesample.Asequenceprofiledisplaysthereadcountbefore correction(inpurplebars)andshowstheeffectsofnoisefiltering(lightpurpleforthereadsthatareremoved)andnoisecorrection(withthenoisereadsaddedtotheparent allelesindarkorange).Whenperformingcorrection,itispossiblethatanallelegainsreadsbecausethenoisereadsoriginatingfromthisalleleareadded,butlosesreadsatthe sametimesincethenoiseofanotheralleleintheprofileincludesreadsofthisallele.Thisoverlappingpartofaddedandremovedreadsismarkedseparatelyinlightorange. Thismeansthattheoriginalreadcountofanallelebeforecorrectionisthecombinationofthepurpleandthelightorangebar.Thelinesinthebarsindicatethestrandbalance; thelineisdrawnnearthetopofthebarifthemajorityofreadsofasequenceisontheforwardstrand,nearthebottomofthebarifthemajorityofreadsisonthereverse strand,andinthemiddleofthebarintheabsenceofstrandbias.Sequencesdisplayedingreeninthegraphsaretheallelesthatthesoftwareinferstobegenuineallelesinthe sample.Thesearealsodisplayedinthetable.(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

(10)

3.2.Sampleanalysis

3.2.1.Allelecalling,interpretationandvisualisation

Whenareferencedatabasehasbeencreated,onecanproceed withtheanalysisofsamples.FDSToolsanalysessequencingdata, calls alleles and interprets the data by correction for noise as inferredfromthereferencedatabase.Resultscanberepresentedas agraphicalsequenceprofileoutputandasaninteractiveprofile report.

InFig.5,anexampleofasequenceprofileoftwolociofa single-sourcesampleisdisplayed(generatedbythecommand‘fdstools vissample’).Asequenceprofiledisplaysthereadcountsbeforeand aftercorrectionand visualisesthe effects of noise filteringand noisecorrection.Amoredetailedexplanationoftheinterpretation ofasequenceprofilecanbefoundinSupplementaryFig.5.

The interactive sequence profile reports provide separate

filtering options for the graphs and tables displayed (see

Section2.4.3).In thegraphs,all alleles that are hidden bythe filtering options are (optionally) aggregated as a separate bar (displayingthecumulativenumbersofreads)withthelabel‘other sequences’.Inaddition,weaggregateallsingletonreadsinto‘other sequences’alreadyinthefirststepoftheanalysis(using‘fdstools tssv–minimum2–aggregate-filtered’)whichhastheadditional benefitsofspeedingupsubsequentanalysisanddecreasingdata storagedemand.

3.2.2.Improvingheterozygotebalancethroughnoisecorrection TheamplificationoflongSTRallelesinthePCRisgenerallyless efficientthanshorterallelesand,inaddition,longSTRallelessuffer fromahigherdegreeofstutterresultinginreducedheterozygote balancebetweenthetwoalleles. [2]SinceFDSTools determines which‘noisereads’arederivedfromwhichparentalleles,these readscan(optionally)beaddedtothereadcountsoftheparent alleles,whichtheoreticallywillimprovetheheterozygotebalance.

When we examine the heterozygote allele balance in the 429

single-source reference samples, an improved heterozygote

balanceisindeedobservedwhenthestutterreadsareaddedto the read counts of the parent alleles (Table 2). Heterozygote balancewasdeterminedperlocusbydividingthereadcountsfor thelessfrequentallelesbythoseforthemorefrequentalleles,and takingtheaverageofall429samples.

3.2.3.Mixtureanalysis

For the analysis mixtures, noise correction may assist in

identifyingtheallelesofalowminorcontributor.Weused31 two-personmixtureswithminorcontributionsof50%,20%,10%,5%,1% and0.5%toassessthisexpectation.

We varied the ‘percentage of locus’ threshold (Table 1) for callingalleles,whichsetsalimittothemixtureproportion.When nonoisecorrectionwasappliedthethresholdwasvariedbetween

5.0% and 1.5%; when noise correction was applied, a lower

Table2

Heterozygotebalancefororiginal,filteredandcorrecteddatasets.Thereadcountsforthelessfrequentallelesaredividedbythoseforthemorefrequentalleles,andthe averageforall429single-sourcereferencesamplesistaken.

Dataset\locus Amel CSF1P0 D10S1248 D12S391 D13S317 D16S539 D18S51 D19S433 D1S1656 D21S11 D22S1045 D2S1338 Uncorrecteddata 0.83 0.88 0.82 0.79 0.88 0.87 0.85 0.84 0.88 0.89 0.85 0.77 Filtereddatawithoutnoisereadsaddedto

allelereadcount

0.83 0.90 0.87 0.80 0.89 0.89 0.87 0.88 0.89 0.90 0.88 0.78

Correcteddatawithnoisereadsaddedto allelereadcount

0.83 0.91 0.89 0.84 0.90 0.91 0.89 0.90 0.91 0.91 0.93 0.80

Dataset\locus D2S441 D3S1358 D5S818 D7S820 D8S1179 FGA PentaD PentaE TH01 TPOX vWA Uncorrecteddata 0.90 0.88 0.90 0.88 0.89 0.85 0.87 0.78 0.88 0.88 0.85 Filtereddatawithoutnoisereadsaddedtoallelereadcount 0.90 0.90 0.91 0.89 0.90 0.87 0.87 0.78 0.88 0.89 0.88 Correcteddatawithnoisereadsaddedtoallelereadcount 0.91 0.91 0.91 0.91 0.91 0.89 0.88 0.81 0.89 0.90 0.90

Table3

Averagenumberofdrop-inallelesandaveragedrop-outpercentagepersamplefordifferent‘percentageoflocus’allele-callingthresholds. a)Summaryofdrop-inanddrop-outratesforvariousallele-callingthresholds

Minorcontribution 50%:600pg 20%:240pg 10%:120pg 5%:60pg 1%:60pg 0.5%:60pg Analysismethodand

threshold #drop-in/ %drop-out #drop-in/ %drop-out #drop-in/ %drop-out #drop-in/ %drop-out #drop-in/ %drop-out #drop-in/ %drop-out Withoutcorrection,5.0% 0.7/0.0% 0.8/2.1% 1.2/25.0% 1.0/33.1% 1.7/39.9% 0.8/40.8% Withoutcorrection,2.5% 4.3/0.0% 3.8/0.0% 4.3/2.5% 4.2/21.6% 3.5/38.3% 2.3/40.4% Withoutcorrection,2.0% 5.3/0.0% 5.3/0.0% 5.3/0.5% 4.8/15.3% 4.2/38.1% 2.8/40.1% Withoutcorrection,1.5% 7.7/0.0% 7.7/0.0% 7.0/0.0% 6.0/10.8% 6.0/37.4% 4.7/39.7% Withcorrection,3.0% 0.0/0.0% 0.3/0.0% 0.3/5.7% 0.0/30.3% 0.7/41.1% 0.3/41.1% Withcorrection,2.5% 0.3/0.0% 0.3/0.0% 0.5/1.4% 0.0/25.1% 0.7/40.6% 0.3/41.1% Withcorrection,2.0% 0.7/0.0% 1.2/0.0% 1.8/0.5% 0.8/17.4% 0.8/40.6% 1.2/40.8% Withcorrection,1.5% 1.7/0.0% 2.3/0.0% 2.5/0.0% 2.0/10.8% 2.7/39.9% 2.3/40.8% Withcorrection,1.0% 4.0/0.0% 5.2/0.0% 4.5/0.0% 4.5/3.8% 5.7/37.8% 3.2/40.6% Withcorrection,0.5% 16.3/0.0% 17.3/0.0% 14.0/0.0% 15.5/2.4% 14.0/30.3% 11.0/37.4% b)Categoriseddrop-outrateswhenusing1.5%allele-callingthreshold(withcorrection)

Minorcontribution 20%:240pg 10%:120pg 5%:60pg 1%:60pg 0.5%:60pg Allelesuniquetotheminor(homozygous) 0.0% 0.0% 0.0% 91.3% 100.0% Allelesuniquetotheminor(heterozygous) 0.0% 0.0% 31.6% 98.1% 99.4%

Allelesuniquetominor 0.0% 0.0% 27.0% 97.2% 99.4%

Allallelesoftheminor 0.0% 0.0% 18.5% 67.7% 69.3%

(11)

threshold could be used, varying between 3.0% to 0.5%. We comparedthevariousmethodsbycalculatingpercentagemissed alleles(a.k.a.drop-out)andthenumberoferroneousallelecalls(a. k.a.drop-in).Thepercentagedrop-outwascalculatedbydividing thenumber of donor alleles not calledby thetotal number of possiblealleles (homozygousand shared alleles arecountedas one,Amelisincluded),andthepercentagewasaveragedforthe mixtureswiththesamemixtureratio.Drop-inispresentedinthe averagenumberoccurringinprofileswiththesamemixtureratio. InTable 3, theresultsof theseanalyses aredisplayedand it is obviousthat withoutcorrectionmoredrop-inalleles occurthat mostlyrepresentstutters.Consequently,thethresholdforcalling anallelecanbelowerwhencorrectionisapplied,aslessstutters remaininthecorrectedprofilethatcanbewrongfullycalledasan allele.Asexpected,thepercentageofdrop-outdependslargelyon the‘percentageoflocus’thresholdforallelecalling(andhardlyon theapplication of noise correction); drop-out is more frequent withahigher(morestringent)threshold.Whenthethresholdfor correcteddataisdecreasedbelow1.5%,thenumberof drop-ins rapidly increases for all ratios. Not surprisingly, the drop-out percentageforthemixtureswith1%and0.5%isveryhighwhen

using a threshold that is higher than the minor component.

Therefore,thedatafromthe5%and10%minorcontributionwas usedtodeterminetheoptimalthresholdforallelecalling.

InFig.6,weshowtherelationbetweenthe‘percentageoflocus’ allele-calling threshold, drop-out and drop-in for the mixtures

with a 5% or 10% minor contribution. In the used dataset, a

threshold of 1.5% appears to be the most effective for calling genuine alleles in mixtures with minimal erroneous calling of remainingnoiseinthemixtures.Withmixtureratiosdownto10%,

no drop-out and only minimal drop-in is observed with this

threshold,whereaswithcontributionssmallerthan10%anoptimal balancebetweendrop-outanddrop-inisachieved(Table3).When investigating the noise that is erroneously called using this thresholditisapparentthatthedrop-inallelesarerarelyresulting fromstutterbutalmostexclusivelyconsistofPCRhybrids[20].In

Table3b,thepercentageofdrop-outwhenusingthe1.5% allele-callingthresholdiscategorisedandillustratesthatdrop-outalleles consistmostlyofheterozygousminoralleles.Mostofthese drop-out alleles represent minor contributions on stutter positions (wherestutterratioofthemajorcontributorwaslowerthanthe averageobservedinthesetofreferencesamples,therebycausing over-correction) and long alleles that suffer fromheterozygote imbalance.InFig.6thetrendsfromTable3areconfirmed: drop-outis hardlyand drop-inislargelyaffectedbytheuseofnoise correction. Thus, calling of genuine alleles is not negatively influencedbynoisecorrection.

Notethatthenumberofdrop-insmaybereducedfurtherby applying additional thresholds fromTable 1, but the effects of varyingadditionalthresholdvalueswerenotstudiedindepth.

InFig.7a, b,theeffectofnoisecorrectiononallelecallingis shown for ahighly unbalancedmixture (95:5mixtureratio)in whichtheallelesoftheminorcontributorhaveasimilarorlower read countthanthestutterproductsoftheallelesof themajor

contributor. Without noise correction the four most frequent

sequence variants are the major contributor’s alleles and the corresponding1stuttersandinterpretationofthelessfrequent sequencevariantsbecomesintractable;afternoisecorrection,the stutterproductsandotherPCRartefactsarefilteredoutandfour sequence variants meet the ‘percentage of locus’ allele-calling thresholdof1.5%,whichcorrespondtothefourallelesofthetwo heterozygousdonors.Also,theallelesofboththemajorandminor contributorhavegainedrecoveredreads.

Thisexamplealsodisplaysa pitfallof theinterpretationofa mixedDNAprofilewherethemajorcontributorhasaninfrequent alleleforwhichnoBGEstimatenoiseprofileisavailable.Thenoise correction ofalleleCE22_TAGA[16]CAGA[6]isonlybasedonthe stutter model (using BGPredict), which fails to correct for the CE22_TAGA[17]CAGA[5]PCRartefact.Lookingatthenoiseprofile

of the most resembling allele in the reference database,

CE21_TAGA[15]CAGA[6], we find a similar PCR artefact

CE21_TAGA[16]CAGA[5] thatrepresents a C toTsubstitution at

Fig.6.Averagenumberofdrop-inandpercentageofdrop-outpersamplefordifferent‘percentageoflocus’allele-callingthresholds.Effectofdifferent‘percentageoflocus’ allele-callingthresholdsonthedrop-inanddrop-outrates.Thenumbersnexttothepointsdisplayallele-callingthresholds.Thepositionofeachpointillustratesthenumber ofdrop-insandpercentageofdrop-outforthecorrespondingthresholdinmixtureswitharatioof90:10(orange)and95:5(blue).Pointsconnectedbydashedlines correspondtoresultsobtainedwithoutnoisecorrection,pointsconnectedbysolidlinescorrespondtoresultsobtainedafternoisecorrection.The1.5%‘percentageoflocus’ allele-callingthresholdthatappearsmostoptimalisindicatedinbold.(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtotheweb versionofthisarticle.)

(12)
(13)

thefirstCAGArepeatunit,withanoiseratioofabout2%(Fig.7c). ThissuggeststhattheCE22_TAGA[17]CAGA[5]artefactwouldbe properlycorrectedifanoiseprofilefortheCE22_TAGA[16]CAGA[6] allelewouldbeavailable.Thus,additionalinspectionoftheapplied

method of correction (BGPredict or BGEstimate) may be useful

wheninfrequentallelesoccur.

3.3.Analysistimeandcomputermemorydemand

Toindicatetherequiredtimeandcomputermemorydemand,

fivesampleswithdifferentnumbersofreads(15,169–318,403total readpairs)wereanalysedandthetimeandpeakmemoryusagefor eachseparatetoolwasregistered(SupplementaryFig.6).Withthe used2.0GHzprocessor,theanalysistimeismostlyconsumedby TSSV(75%ofthetotalanalysistime)andthecompleteanalysis

only takes up to 13:30min for a sample with 318,403 reads.

BGCorrectshows thehighest peak memoryusagebut doesnot

exceed200MBforthelargestsample(ofthefivetestedsamples). Boththerequiredtimeandmemoryincreasemoreorlesslinearly whenthereadcountoftheanalysedsamplesisincreased. 4.Conclusions

WedevelopedFDSToolsfortheanalysisofforensicMPSdata.

FDSTools can determinesystemic PCR and/orsequencing noise

fromthedataofreferencesamples,buildadatabasefromthisdata and use it to correct for systemic noise in case samples. The softwareis also abletopredictthe noise caused bystutterfor

alleles not included in the reference database and uses this

informationinthecorrectionofcasesamples.

Withautomaticthreshold-basedallelecalling,noisecorrection reducestheoccurrenceofdrop-inanddrop-outsubstantiallyand improvesthebalancebetweenallelesofaheterozygotepair.This decreasesthedetectionlimitsofminorcontributionsinmixtures. STRstuttervariantsarenolongerthemostfrequentremaining noiseasPCRhybridartefactsnowgenerallyexceedthecorrected readcountsofstutters.

Althoughreliablenoisecorrectioncanalreadybeobtainedfrom adatabaseof100samples,alargerdatabaseispreferredasalarger numberofallelescanbecorrectedbytheuseofacompletenoise profileinsteadofrelyingonnoisepredictionsbasedonthestutter model.Thiswillalsoreducemanualinspectionofretained non-stutternoiseforinfrequentalleles.Whenbuildingthedatabase,it isimportanttouseanamplificationkitrepresentativeforthekit

used for the samples. Although not extensively tested, we

anticipate that noise prediction will be less precisewhen less DNAis used andmore stochasticPCR effects occur.Also, more strandbiaswilloccurduringthemassivelyparallelsequencing. Theseeffectsareintrinsictolow-levelDNAtypingand

probabilis-ticgenotypingsoftwarehasbeendevelopedthataccommodates

drop-inanddropoutduringprofileinterpretation[21–28].Such softwarearenotyetstraightforwardlyabletodealwithMPSdata, butthenecessaryadaptationsarefeasibleandinclude nomencla-tureforsequencevariants,allelefrequenciesdatabasesandread countsreplacingpeakheightsincontinuousmodels(notrequired forsemi-continuousmodels).InCE-basedanalysis,PCRreplicates areoftenusedtoreduceprofilinguncertainty[7];replicatescanbe enteredinprobabilisticgenotypingsoftwareorusedtopreparea

consensusprofile[7,8].AfutureversionofFDSToolswillfeaturea consensus-basedanalysismethodalikethoseusedwithCEdata [8]. Besides, export options for DNA databasesystems such as CODISwillbeadded.

FDSTools hasbeenvalidatedfollowing recommendations for

software validation [29,30]and is already implemented in the

ISO17025 certified environment of the LUMC for forensic

casework.Thevalidationforuseandperformanceofthesoftware incaseworkwasaseparatestudywhichwasnotbasedonthedata described inthismanuscript.By providingtoolstoevaluatethe performance of noise correction in referencesamples FDSTools facilitatesthedeterminationofanalysisthresholdsthatarefitfor purpose.

TheapplicationofFDSToolsisnotlimitedtotheanalysisofSTRs. FDSToolshasalreadybeenappliedsuccessfullytotheanalysisof multiplexassaysofSNPfragments(manuscriptinpreparation)and

complete mtDNAdata[31].Notethat theminimumnumber of

requiredreferencesamplesforlociotherthanSTRswilldependon theamountofvariationobservedintheseloci.

Acknowledgements

This study was supported by a grant from the Netherlands

Genomics Initiative/Netherlands Organization for Scientific

Re-search (NWO) withinthe frameworkof the ForensicGenomics

ConsortiumNetherlands.TheauthorswishtothankArieKoppelaar (NetherlandsForensicInstitute)forprovidinghelpandexpertisein setting upthecomputerclusterfortheanalyses.Wealsothank

Martin Ensenberger, Cynthia Sprecher and Douglas R. Storts

(Promega Corporation)for theircontributions todesigning and providingthePowerSeqTMAutoSystem.

AppendixA.Supplementarydata

Supplementarydataassociatedwiththisarticlecanbefound,in

the online version, at http://dx.doi.org/10.1016/j.

fsigen.2016.11.007. References

[1]M.A.Jobling,P.Gill,Encodedevidence:DNAinforensicanalysis,Nat.Rev. Genet.5(10)(2004)739–751.

[2]K.J.vanderGaag,R.H.deLeeuw,J.Hoogenboom,J.Patel,D.R.Storts,J.F.J.Laros, P.deKnijff,Massivelyparallelsequencingofshorttandemrepeats—Population dataandmixtureanalysisresultsforthePowerSeqTMsystem,ForensicSci.Int.

Genet.24(2016)86–96.

[3]K.B.Gettings,K.M.Kiesler,S.A.Faith,E.Montano,C.H.Baker,B.A.Young,R.A. Guerrieri,P.M.Vallone,Sequencevariationof22autosomalSTRlocidetected bynextgenerationsequencing,ForensicSci.Int.Genet.21(2016)15–21. [4]C. Gelardi,E.Rockenbauer, S.Dalsgaard,C. Børsting,N.Morling,Second

generationsequencingofthreeSTRsD3S1358:D12S391andD21S11inDanes andanewnomenclatureforsequencedSTRalleles,ForensicSci.Int.Genet.12 (2014)38–41.

[5]M.Scheible,O.Loreille,R.Just,J.Irwin,Shorttandemrepeattypingonthe454 platform:strategiesandconsiderationsfortargetedsequencingofcommon forensicmarkers,ForensicSci.Int.Genet.12(2014)104–119.

[6]X.Zeng,J.L.King,M.Stoljarova,D.H.Warshauer,B.L.LaRue,A.Sajantila,J.Patel, D.R.Storts,B.Budowle,Highsensitivitymultiplexshorttandemrepeatloci analyseswithmassivelyparallelsequencing,ForensicSci.Int.Genet.16(2015) 38–47.

[7]C.C.Benschop,C.P.vanderBeek,H.C.Meiland,A.G.vanGorp,A.A.Westen,T. Sijen,LowtemplateSTRtyping:effectofreplicatenumberandconsensus

Fig.7.Interpretationofamixedsequenceprofilebeforeandaftercorrection.a)SequenceprofileoflocusD12S391ofamixedsamplewitharatioof95:5,withoutnoise filteringandcorrection.b)SequenceprofileoflocusD12S391ofamixedsamplewitharatioof95:5,withnoisefilteringandcorrectionapplied.c)NoiseprofileofD12S391 alleleCE21_TAGA[15]CAGA[6].SequenceprofileoflocusD12S391ofamixedsamplewitharatioof95:5,AwithoutandBwithapplyingnoisefilteringandcorrection.The tabledisplaysallsequencevariantswithatleast1.5%ofthereadsofthelocus(the‘percentageoflocus’allele-callingthresholdused),whicharealsomarkedingreeninthe graph.AnoteinthetableinpanelBwarnsthatnonoiseprofilewasavailableforthemajorCE22_TAGA[16]CAGA[6]alleleandastutterpredictionhasbeenusedinstead.An additionalvariantCE22_TAGA[17]CAGA[5],whichisderivedfromthemajorCE22allele,remainsvisibleinthesequenceprofile(althoughnotmarkedgreenasitdoesnot meetthe1.5%threshold).CNoiseprofileofasimilarallele,showinganon-stutterPCRartefactwithanoiseratioofabout2%.(Forinterpretationofthereferencestocolourin thisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

(14)

methodongenotypingreliabilityandDNAdatabasesearchresults,Forensic Sci.Int.Genet.5(2011)316–328.

[8]C.C.Benschop,T.Sijen,LoCIM-tool:anexpert’sassistantforinferringthemajor contributor’sallelesinmixedconsensusDNAprofiles,ForensicSci.Int.Genet. 11(2014)154–165.

[9]C. Brookes,J.A. Bright,S.Harbison,J. Buckleton,Characterisingstutterin forensicSTRmultiplexes,ForensicSci.Int.Genet.6(2012)58–63. [10]S.Y.Anvar,K.J.vanderGaag,J.W.vanderHeijden,M.H.Veltrop,R.H.Vossen,R.

H.deLeeuw,C.Breukel,H.P.Buermans,J.S.Verbeek,P.deKnijff,J.T.den Dunnen,J.F.J.Laros,TSSV:atoolforcharacterizationofcomplexallelicvariants inpureandmixedgenomes,Bioinformatics30(2014)1651–1659. [11]D.H.Warshauer,J.L.King,B.Budowle,STRaitrazorv2.0:theimprovedSTR

alleleidentificationtool–razor,ForensicSci.Int.Genet.14(2015)182–186. [12]S.L.Friis,A.Buchard,E.Rockenbauer,C.Børsting,N.Morling,Introductionof

thePythonscriptSTRinNGSforanalysisofSTRregionsinFASTQorBAMfiles andexpansionoftheDanishSTRsequencedatabaseto11STRs,ForensicSci. Int.Genet.21(2016)68–75.

[13]A.A.Westen,T.Kraaijenbrink,E.A.RoblesdeMedina,J.Harteveld,P.Willemse, S.B.Zuniga,K.J.vanderGaag,N.E.Weiler,J.Warnaar,M.Kayser,T.Sijen,P.de Knijff,ComparingsixcommercialautosomalSTRkitsinalargeDutch populationsample,ForensicSci.Int.Genet.10(2014)55–63.

[14]M.G.Ensenberger,K.A.Lenz,L.K.Matthies,G.M.Hadinoto,J.E.Schienman,A.J. Przech,M.W.Morganti,D.T.Renstrom,V.M.Baker,K.M.Gawrys,M. Hoogendoorn,C.R.Steffen,P.Martín,A.Alonso,H.R.Olson,C.J.Sprecher,D.R. Storts,DevelopmentalvalidationofthePowerPlex1Fusion6CSystem, ForensicSci.Int.Genet.21(2016)134–144.

[15]T. Mago9c, S.L.Salzberg,FLASH: fastlengthadjustment ofshort readsto improvegenomeassemblies,Bioinformatics27(2011)2957–2963. [16]B.C.Hendrickson,B.Leclair,S.Forrest,J.Ryan,B.E.Ward,D.Petersen,T.D.

Kupferschmid,T.Scholl,AccurateSTRalleledesignationsattheFGAandvWA locidespitesitepolymorphisms,J.ForensicSci.49(2)(2004)250–254. [17]D.Shinde,Y.Lai,F.Sun,N.Arnheim,TaqDNApolymeraseslippagemutation

ratesmeasuredbyPCRandquasi-likelihoodanalysis:(CA/GT)nand(A/T)n microsatellites,Nucl.AcidsRes.31(2003)974–980.

[18]M.Klintschar,P.Wiegand,Polymeraseslippageinrelationtotheuniformityof tetramericrepeatstretches,ForensicSci.Int.135(2)(2003)163–166. [19]A.Satyanarayan,R.Russell,J.Hoffswell,J.Heer,ReactiveVega:astreaming

dataflowarchitecturefordeclarativeinteractivevisualization,IEEETrans.Vis. Comput.Graph.22(2016)659–668.

[20]A.R.Shuldiner,A.Nirula,J.Roth,HybridDNAartifactfromPCRofclosely relatedtargetsequences,Nucl.AcidsRes.17(11)(1989)4409.

[21]H.Haned,P.Gill,AnalysisofcomplexDNAmixturesusingtheForensim package,ForensicSci.Int.Genet.Supp.Ser.3(2011)e79–e80.

[22]H.Swaminathan, A. Garg, C.M. Grgicak, M.Medard, D.S. Lun, CEESIt:a computationaltoolfortheinterpretationofSTRmixtures,ForensicSci.Int. Genet.(2016)149–160.

[23]D.J. Balding, Evaluation of mixed-source, low-template DNA profiles in forensicscience,Proc.Natl.Acad.Sci.U.S.A.110(30)(2013)12241–12246. [24]D.Taylor,J.A.Bright,J.Buckleton,Theinterpretationofsinglesourceandmixed

DNAprofiles,ForensicSci.Int.Genet.7(2013)516–528.

[25]R.G.Cowell,T.Graversen,S.L.Lauritzen,J.Mortera,AnalysisofforensicDNA mixtureswithartefacts,Appl.Stat.64(1)(2015)1–32.

[26]O.Bleka,G.Storvik,P.Gill,EuroForMix:anopensourcesoftwarebasedona continuousmodeltoevaluateSTRDNAprofilesfromamixtureofcontributors withartefacts,ForensicSci.Int.Genet.21(2016)35–44.

[27]M.W.Perlin,M.M.Legler,C.E.Spencer,J.L.Smith,W.P.Allan,J.L.Belrose,B.W. Duceman,ValidatingTrueAllele1DNAmixtureinterpretation,J.ForensicSci. 56(2011)1430–1447.

[28]R.Puch-Solis,L.Rodgers,A.Mazumder,S.Pope,I.W.Evett,J.Curran,D.Balding, EvaluatingforensicDNAprofilesusingpeakheightsallowingformultiple donors,allelelicdropoutandstutters,ForensicSci.Int.Genet.7(2013)555– 563.

[29]H.Haned,P.Gill,K.Lohmueller,K.Inman,N.Rudin,Validationofprobabilistic genotypingsoftwareforuseinforensicDNAcasework:definitionsand illustrations,Sci.Justice56(2)(2016)104–108.

[30]E.J.Rykiel,Testingecologicalmodels:themeaningofvalidation,Ecol.Model. 90(1996)229–244.

References

Related documents

a) Classes I to V, b) Classes VI to VIII and c) Classes IX to XII iii. Topics are given in Annexure 1. Students may submit their entries as Essay/ Poem/ Drawing. Entries may

2 HEINEKEN Asia Pacific: Uniquely Positioned to Win 3 APB: Strong Growth Momentum. 4 Regional

Finally, the complete list of contributions obtained as the outcomes from this thesis are detailed: the EHU-OEF facility, the NV taxonomy, the L2PNV proposal, the MACP protocol,

Improving medication management in multimorbidity: development of the Multimorbidity Collaborative Medication Review and Decision-making (MY COMRADE) intervention using the

This high mobile phone ownership on the island is reflected in Indigenous Australia generally: “Telstra’s own figures have shown that the introduction of mobile telephony

As shown by equation (6), the unemployment gap depends again on the average job finding and separation rates: a higher structural unemployment rate leads to more business cycle

The purpose of this dissertation was twofold: 1) to systematically review studies of the influence of friendship networks on adolescents’ risky behaviors, which utilized

Therefore the objective of this study was to determine the impact of replacing corn with increasing inclusions of algae meal on total tract nutrient digestibility in