Genomic Selection in Plant Breeding: Methods, Models, and Perspectives

(1)

Feature

Review

Genomic

Selection

in

Plant

Breeding:

Methods,

Models,

and

Perspectives

José

Crossa,

1,

* Paulino

Pérez-Rodríguez,

2

Jaime

Cuevas,

3

Osval

Montesinos-López,

4

Diego

Jarquín,

5

Gustavo

de

los

Campos,

6

Juan

Burgueño,

1

Juan

M. González-Camacho,

2

Sergio

Pérez-Elizalde,

2

Yoseph

Beyene,

1

Susanne

Dreisigacker,

1

Ravi

Singh,

1

Xuecai

Zhang,

1

Manje

Gowda,

1

Manish

Roorkiwal,

7

Jessica

Rutkoski,

8

and

Rajeev

K. Varshney

7,

*

Genomic selection (GS) facilitates the rapid selection ofsuperior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles,andbasisofGSandgenomic-enabledprediction(GP)aswellasthe geneticsandstatistical complexitiesofGPmodels,includinggenomic geno-typeenvironment(GE)interactions.WealsoexaminetheaccuracyofGP modelsand methods fortwo cereal crops and two legumecrops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement(i.e.,prebreeding)programscould acceleratetheﬂowofgenes fromgene bank accessionsto elitelines. Recent advances in hyperspectral imagetechnologycouldbecombinedwithGSandpedigree-assistedbreeding. TheRole ofGenomic-Enabled PredictioninPlant Breeding

Beginningduringthe1980s,thedevelopmentofdifferentmolecularmarkersystemsdrastically increased the total number of polymorphic markers available to plant breeders, and to molecularbiologistsingeneral.Themostnotablehigh-throughputgenotyping(HTG)system issinglenucleotidepolymorphisms(SNPs),whichhavebeenusedintensivelyinquantitative traitlocus(QTL;see Glossary)discovery.More than 10000 QTLs usingdifferentmarker systemshavebeenreportedinmorethan120studiescovering12plantspecies[1]thataimed toimprovequantitativetraitsofeconomicimportance.Initially,molecularmarkerswere inte-gratedintraditionalphenotypicselection(PS)byapplyingmarker-assistedselection(MAS). Forsimpletraits,MAScomprisesselectingindividualswithQTL-associatedmarkersthathave majoreffects;markersnotsignificantlyassociatedwithatraitarenotused.However,attempts toimprovecomplexquantitativetraitsbyusingQTL-associatedmarkerdetectionhavebeen unsuccessfulduetothedifficultyoffindingthesameQTLacrossmultipleenvironments(dueto QTLenvironmentinteractions)orindifferentgeneticbackgrounds[2].

LinkageanalysisforQTLmappingisdoneonbiparentalpopulations,buthaslowpowerfor detectingmarker–traitassociationduetochromosomeswithlowrecombinationrates. There-fore,associationmappingstartedduringtheearly2000swiththeobjectiveofovercomingthe lowpower oflinkage analysis,thusfacilitating thedetectionof marker–trait associations in

Trends

Inrecentyears,theglobalclimatehas changed,resultingindrastic ﬂuctua-tionsinrainfallpatternsandincreasing temperature.Suddenclimatechanges cancausesigniﬁcanteconomiclosses tocountriesworldwide.

Geneticimprovementofseveral eco-nomicallyimportantcropsduringthe 20thcenturyusingphenotypic, pedi-gree,andperformancedatawasvery successful. However, signs of grain yieldstagnationinsomecrops, espe-ciallyindrought-stressed and semi-aridregions,areevident.

Genomicselectionoffersthe opportu-nitytoincreasegrainproductioninless time.InternationalMaize and Wheat ImprovementCenter(CIMMYT)maize breeding research in Sub-Saharan Africa,India,andMexicohasshown thatgenomicselectioncanreducethe breedingintervalcycletoatleasthalf theconventionaltime andproduces linesthat,inhybridcombinations, sig-niﬁcantly increasegrain yield perfor-mance over that of commercial checks.

Publicandprivateinvestmentincrop genomic selection research should increasetosuccessfullydevelopinless time germplasm that is adapted to suddenclimatechange.

(2)

nonbiparentalpopulationsandﬁne-mappingchromosomesegmentswithhighrecombination rates.However,themainproblemofﬁne-associationmappingisthelowpowerfordetecting rarevariantsthatmaybeassociatedwitheconomicallyimportanttraits[2].Thus,thechallenge ofassociationmappingandQTLdetectionresidesinidentifyingandquantifyingrareQTLswith smalleffects for economicallyimportant traits thatare highly affectedby theenvironment. However,becausethecostofSNPassayshasdramaticallydecreased,thepossibilityofusing high-densitySNP arrays(tensofthousands) hasresulted inthe developmentofstatistical modelstopredictmarker–traitassociationaccurately,dependingonthegeneticarchitectureof thepredictedtrait.

Contrary to QTL and association mapping,GS uses all molecular markers for GP of the performanceofthecandidatesforselection.Therefore,theaimofGSistopredictbreeding and/orgeneticvalues.GScombinesmolecularandphenotypicdatainatrainingpopulation (TRN)toobtainthegenomicestimatedbreedingvalues(GEBVs’)ofindividualsinatesting population(TST)thathavebeengenotypedbutnotphenotyped[3].Figure1Adepictsthetwo basicpopulationsinaGSprogram:theTRNdata,whosephenotypeandgenotypeareknown, andtheTSTdata,whosegeneticvaluesaretobepredicted.GSisusedinplaceofphenotyping forafew selectioncycles.Themainadvantages ofGS overphenotype-basedselection in breedingarethatitreducesthecostpercycleandthetimerequiredforvarietydevelopment.In termsofcostreductioninmaizebreeding,thebreedercantestcross50%ofallavailablelines, evaluatingtheminfirst-stagemulti-locationaltrials,andcanthenusethephenotypicdatato predictthe remaining 50% byGS. Figure 1B showsthe advantageof GS overPS for:(i) reducingcosts,upto50%;and(ii)savingtimebyselectinglinesdirectlyforstageIIinstead goingthroughstageI(usedinPS).Thissignificantlyreducesthecostoftestcrossformationand evaluationateachstageofmulti-locationevaluations.ThetimeefficiencyoverPScouldcome fromthesecondcycleofselection,whichusestheTRNfromthepreviouscycletopredictthe new doubled haploid (DH) lines, thus excluding testcross formation and first-stage multi-locationevaluationtrials.BasedonGS,thebestlinescouldgodirectlytothesecondstage ofmulti-locationevaluations.

GSpredictsthebreedingvalues(BVs)ofthecandidatesforselection.BVshavetwo compo-nents:the parentalaverage (the mean BV of both parents) and the deviation of progeny performancefromthisaveragethatisduetoMendeliansampling.Inconventionalbreeding,the parentalaverageisquantifiedbypedigreeinformation(ifthegenealogyisavailable),fromwhich arelationshipmatrixAbetweentheindividualscanbederived.Mendeliansamplingassesses within-familyvariationthatisquantifiedbytestingtheprogenyinmultienvironmentfieldtrials. GStakesadvantageofdensemarkerstoquantifyMendeliansampling,thusavoidingtheneed toextensivelyphenotypethe progeny.Thissavestimeby reducingthe cyclelength,while enhancingtheexpectedgeneticgainandtheselectionresponseperunittime;italsouses lessresourcescomparedwithextensivephenotyping.GShasthepotentialtoquicklyimprove complextraitswithlowheritabilityaswellastosignificantlyreducethecostoflineandhybrid development.GScanalsobeusedforsimpletraitswithhigherheritabilitythancomplextraits, forwhichhighGP accuracyisexpected.TheapplicationofGS inplant breedingcouldbe limitedbytwomainfactors:(i)genotypingcosts;and(ii)unclearguidelinesastowhereGScan beefficientlyappliedinabreedingprogram.

GSandGP havebeenappliedusingtwo differentapproaches.Onefocusesonpredicting additiveeffectsinearlygenerationsofabreedingprogram(F2:3)toachievearapidselection

cyclewithashortinterval(i.e.,GSattheF2levelofabiparentalcross).Inthiscase,researchers

are interested in predicting the additive values (BVs) rather than the total genetic value; therefore, additive linear models that summarize the effects of the markers are sufﬁcient. The other approach predicts the complete genetic values of individuals considering both

1

InternationalMaizeandWheat ImprovementCenter(CIMMYT),Apdo. Postal6-641,06600,MexicoCity, Mexico

2

ColegiodePostgraduados, Montecillo,Texcoco,56230,Edo.de Mexico,Mexico

3

UniversidaddeQuintanaRoo, QuintanaRoo,77019,Mexico

4

FacultaddeTelemática,Universidad deColima,Colima,28040,Mexico

5

DepartmentofAgronomyand Horticulture,Universityof Nebraska-Lincoln,321KeimHall,Lincoln,NE 68503-0915,USA

6

DepartmentofEpidemiology& Biostatistics,MichiganState University,909FeeRoad,Room B601,EastLansing,MI48824,USA

7

InternationalCropsResearchInstitute fortheSemi-AridTropics(ICRISAT), Patancheru502324,Telangana,India

8

InternationalRiceResearchInstitute, LosBaños,4030,Philippines

*Correspondence:

[email protected](J.Crossa)and

[email protected]

(3)

Glossary

Breedingvalue(BV):theworthof anindividualasestimatedbythe averageperformanceofits progenies.

Array:high-throughputgenotyping platformcontainingeither genome-wideSNPs(SNParray)orcoding variants(exomearray). Geneticgain:theamountof increaseinperformancethatis achievedthroughgenetic improvementprogramsbetween consecutiveselectioncycles. Genomicestimatedbreeding value(GEBV):breedingvalueofan individualcalculatedusing genome-widemarkerdatatocapturesmall geneticeffectsdispersedoverthe genome.

Genomicselection(GS):atypeof breedingmethodologythatselects thebestcandidatesasparentsfor thenextselectioncyclebasedon theirpredictedbreedingvalues computedgiven:(i)theirgenotypes; and(ii)thephenotypesand genotypesoftheirrelatives. Genotyping-by-sequencing(GBS): ahighlymultiplexgenotypingsystem thatinvolvesDNAdigestionwith enzymesfollowedbyconstructionof areducedrepresentationlibrary, whichissequencedusinga next-generationsequencingplatform. Heteroticgroup:agroupof genotypes,irrespectiveoftheir geneticrelatedness,thatdisplay similarcombiningabilityandheterotic responsewhencrossedwith genotypesfromothergenetically distinctgermplasmofthesame group.

Linkagedisequilibrium(LD):the nonrandomassociationsofallelesat twolocithatcanoccurevenifthe twogenesareunlinked.

Marker-assistedselection(MAS): aprocessforselectingindividuals basedontrait-linkedmarkers. Marker-assistedrecurrent selection(MARS):atypeofMAS thatallowsthesimultaneous identiﬁcationofassociatedallelesas wellastheaccumulationofsuperior allelesfrombothparents. Quantitativetraitlocus(QTL):a gene(DNAsectionalsocalledlocus) thatisassociatedwithvariationina phenotype. (A) Phenotype TRN populaon TST populaon Genotype with genome-wide markers Develop a predicon equaon Predict breeding values for individuals in TST Genotype with genome-wide markers

TRN and TST populaons in genomic selecon

(B)

Phenotypic selecon Phenotypic + genomic selecon Year 1 1 2 2 3 3 4 4 5 5 6 6 B B B B B B A A A A A A A x B A x B F1 - inducer F1 - inducer Haploids Haploids Haploids Haploids

DH per se – seed increase

DH per se – seed increase + genotyping

DH per se – seed increase + genotyping DH for TCs – stage I

DH for TCs – stage I DH for TCs – stage II GS model

GS model 50% in cold room 50% DH for TCs – stage I

First stage mullocaon evaluaon of TCs

First-stage mullocaon evaluaon of TCs

Second-stage mullocaon evaluaon of TCs

Line C x line D Line C x line D

F1 - inducer F1 - inducer

Season

Figure1. PopulationsUsedinGenomicSelectionandaSchemeofPhenotypicandGenomicSelectionin MaizeBreeding.(A)Genomicselection(GS) requiresa trainingpopulation (TRN)thathas beengenotypedand phenotypedandatestingpopulation(TST)thathasonlybeengenotypedbutnotphenotyped.(B)Reductionofcycle lengththroughGSinmaizeusingdoubledhaploids(DH)crossedwithatester(TC).

(4)

additiveand nonadditive(dominanceand epistasis) effects, thereby estimating the perfor-mance (commercial value) of thecultivars. Geneticvalues oflines are predictedfor some environmentsusinganincomplete(sparse)multienvironmenttestingscheme.

Several genetic and statistical factors complicate the practical application of GP. Genetic difﬁcultiesarisefromthesizeanddiversityoftheTRNpopulationandtheheritabilityofthetraits tobepredicted.Statisticalchallengesarerelatedtothehighdimensionalityofmarkerdata, wherethenumberofmarkers(p)ismuchlargerthanthenumberofobservations(n)(p>>n)and themulticolinearityamongmarkers(adjacentmarkersarehighlycorrelated).Formoredetails, see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoaninverseproblem’ inthesupplementaryinformationonline.

Here, we reviewadvances inGS and GP theory inlight of theabove considerations and evaluaterecent examples from GP applied to cerealand legume breeding programs.We describetheevolutionandmainfeaturesofGPmodels,includingcomplexities,strengths,and weaknesses.WethenillustratetheuseofGPusingexamplesfromcropbreedingprograms withgenomicGEinteractions,aswellasresultsofgeneticgainsfromrapidcycleGSin maize.WealsospeculateontheprospectsforGSandGPinplantbreeding.Mostoftheresults presentedinthisreviewincludestudiesperformedonmaizeandwheatfromtheInternational MaizeandWheatImprovementCenter(CIMMYT),aswellasonchickpeafromtheInternational CropsResearchInstitutefortheSemi-AridTropics(ICRISAT)

Genomic-EnabledPredictionModelsandApplications:Copingwith Complexity

ThecomplexityofapplyingGPinbreedingoccursatdifferentlevelsandisinﬂuencedbyseveral factors.Whenatraitisaffectedbyalargenumberofloci,GPaccuracydependsonseveral geneticfactors:(i)thesizeandgeneticdiversityoftheTRNpopulationanditsrelationshipwith theTST population[4];thatis,whetherthecultivars inthe TRNarerelatives(closeand/or distant)ofcultivarsintheTSTset;(ii)theheritabilityofthetrait(s)underselection[complextraits withlowheritabilityandsmallmarkereffectsaresuitableforGSandGP,whereaslesscomplex traits(withhighheritability)canbepredictedbyafewmarkerswithrelativelylargeeffects);and (iii)forcomplextraitswithlargenumbersofmarkersthatarenotinlinkagedisequilibrium(LD) withthe QTL, GP accuracy is lower [5]and increases when the heritability and TRNsize increase.StudieshaveshowntheimportanceofselectinganappropriateTRNpopulationthat optimizestheaccuracyofthepredictionsofthenonphenotypedcultivarsintheTSTset[6]. Dependingonthetrait,theincreaseinGPaccuracyreachesaplateauasthepopulationsize increases.Asimilartrendwasfoundforthenumberofmarkers[7,8].

Oneimportantgenetic-statistical complexityofGP modelsariseswhenpredicting nonphe-notypedindividualsinspeciﬁcenvironments(site–yearcombinations)byincorporatingGE interactionsintothestatisticalmodels.Equallyimportantisthegenomiccomplexityrelatedto GEinteractionsformulti-traits;theseinteractionscreatetraitandenvironmentalstructures that shouldbe dealt with byusing statistical-genetic models that exploitmulti-trait, multi-environmentvariance-covarianceandgenetic correlationsbetweenenvironments, between traits,and between traits and environments, simultaneously. Untanglingthe complexityof multi-traitgenomicsandmultipleenvironmentsrequiresatheoreticalframeworkthataccounts forthesecomplexinteractions[9](see‘Bayesianmulti-traitmultienvironmentgenomicmodel fornormalphenotypes’inthesupplementaryinformationonline).Interestingly,theuseofGPto improvediseaseresistancehasbeenchallenginginwheatfortworeasons:(i)selectionfor majorresistancegenescanbeephemeralduetochangesinpathogenraces;and(ii)breeding forminorresistancegeneswithsmalleffectsthroughoutGS(whichprovides durable resis-tance)mayfacetheusualcomplexitiesencounteredinGS[10].

(5)

AnotherlevelofcomplexityoccursinGSstatisticalpredictionmodelsbecausethenumberof markers (p) is larger than the population size (n) and the predictors (markers) are highly correlated. Thissituation results in a matrix of predictors that is rank deficient, making it impossibletocomputeleast-squareestimatesformarkereffects.Thecomplexityarisesfrom factorssuchasthecourseofdimensionality[11];thatis,undermodelswithp>>n,whichare notlikelihoodidentifiedandarepronetooverfitting,spuriousfeaturesanddatastructuresmay becaptured(see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoan inverseproblem’inthesupplementaryinformationonline).Solutionstotheseproblemsinclude theuseof:(i)penalizedregression;(ii)variableselection;and(iii)dimensionalityreduction(e.g., principalcomponents),suchthatanewsetofpredictorsthatarenotcorrelatedisgenerated fromtheoriginalone(markers),thusallowingtheuseofunivariatedistributionsanddecreasing the computationtimeof the estimatesandthe prediction [12].A fourthsolution is to use statisticalmodelsthatassessGPcomplexitiesandhigh-densitymarkerplatformswithGE interactions,therebyaddingpowertotheGPmodels(seethenextsection).

GPmodelsbasedonbasicquantitativegeneticsdescribethephenotypicresponseasthesum ofageneticvalue(linearadditivemodels)andaresidualvalue.AlargebodyofGPresearchhas focusedondevelopingefﬁcientparametricandnonparametricstatistical andcomputational models withincreasedaccuracy forpredicting nonphenotyped genotypes[13]. Ingeneral, thesetheoreticalstudiesshowreasonablygoodpredictionaccuraciesforcomplextraitssuch asgrainyieldandothertraitsevaluatedbymeansofindependentrandomcross-validationdata partitioning.IncontrasttothewidespreaduseofGPtopredicttheperformanceofonetraitin theTST populations usingdatafrom the sametraitobservedinthe TRNpopulations,the complexityofextendingthistomulti-traitGPindiceshasnotreceivedmuchattention,except foramethodproposedbyCerón-Rojasetal.[14]thatisbasedonthemulti-traitGenomicBest Linear Unbiased Estimator (GBLUP) selection index, which worked well when applied to simulatedandrealdatasets.

Withadvancesin GS and GP, datavolumes andcomplexity have increased dramatically, leading to novel interdisciplinary research efforts to integrate computer science, machine learning,mathematics, physics,statistics, geneticsandquantitative genetics,and bioinfor-matics.Suchworkhasemergedasanewﬁeldofresearch(commonlyknownas‘datascience’ ordata-drivenscience)thataimstounifystatisticswithdataanalysis,datamining,andsoon. Theinterdisciplinaryresearchersindatasciencefocusoncomputingmoreaccuratepredictive valuesbyusingstatisticalmodelsormachine-learningmodels(R.McDowell,MScthesis,Iowa State University,2016). Neural networkmethodsare commonprediction toolsinmachine learning.Neuralnetworkscompriselayersofinterconnectedneurons,wheretheoutputofeach neuronisexpressedasthesumofacertainnumberofinputstoaneuronlocatedinaspeciﬁc networklayer,withaweightplusabias;thesumofallinputsis weightedbyan activation function.

WhenneuralnetworksareappliedtoGP,theinputlayeriseachmarkerwithoneneuronper marker;eachoftheneurons(markers)intheinputlayerisconnectedtoalltheneuronsinthe ﬁrsthiddenlayer,andtheseareconnectedtoallneuronsinthesecondhiddenlayer,andsoon, uptotheoutputlayer,whichisonehiddenneuronlayerwiththepredictionofeachofthe phenotypes.Recentdevelopmentsinneuralnetworksandspeediercomputerprocessinghave allowedtheadditionofnewlayerstotheneuralnetwork(deepmachinelearning)tocapture smallcrypticcorrelationsbetweeninputs[15],whichinGPareinteractionsbetweenmarkers. Initialapplications of machine-learning and neural networks in GP were demonstrated by Gianolaetal.[16,17],González-Camachoetal.[18,19],Pérez-Rodríguezetal.[20],Ornella etal.,[21],andGonzález-Recioetal.[22].Recentresultsfordeepmachinelearningappliedto GPcanbefoundelsewhere(R.McDowell,MScthesis,IowaStateUniversity,2016).

(6)

TheAccuracyofGPModels,andGeneticGainsAchievedbyGS

TheGPmodelsassessdifferentpredictionproblemsthatattempttomimicwhathappenswhen predictionsaremadeinrealsituations.Differentrandomcross-validationschemeshavebeen designedtosimulatethepredictionproblemsthatresearchersmayfacewhenperformingGS. There are four basic scenarios arising from combinations of tested (observed) lines (LT), untested(unobserved)lines(LU)withtested(observed)environment(ET),oruntested (unob-served)environments (EU). Predicting newly developed lines (or cultivars) in environments wherethey were nottestedisa caseof LU-ET(randomcross-validation1, CV1).Another problemistopredictlinesinsomeenvironmentsbutnotinothers;thisisLT-ET(random cross-validation2,CV2),whichattemptstomimiconeoftheobjectivesofGP:sparsetesting.Another problemcomprisespredictinglinesinuntestedenvironments;thatis,LT-EU(random cross-validation0,CV0).Finally,thereistheproblemofpredictinglinesneverobservedin never-observedenvironments,LU-EU(cross-validation00,CV00)[23–27].

Simulationandempiricalresultsobtainedbyrandomcross-validationsuggestthatGS enhan-cesgeneticgainsbyshorteningthebreedingcycle(rapidselectioncycle)and/orenhancing testingefficiencyinfieldevaluations[3,28–31].Resultsofusingrandomcross-validationon maizeandwheatbreedingdataindicatethatGScansignificantlyenhancepredictionaccuracy relatedtopedigreeandMASforlow-heritabilitytraits[13,18–20,32–43].ResultsofapplyingGS inmaizeandwheatbreedingindicateitseffectivenessinselection[44–48].

BreedingprogramsworldwidehavebeenstudyingandapplyingGSandGPinseveralcrops.In parallel,extensiveresearchhasresultedinnovelstatisticalmethodsthatincorporatepedigree, genomic,andenvironmentalcovariates(e.g.,weatherdata)intostatistical-geneticprediction models. GBLUP models [49,50] are widely used in GP, and the extension of GBLUP for incorporatingGEinteractionshasimprovedtheaccuracyofpredictingunobservedcultivars in environments [23,24,51–56]. New models for assessing the GP accuracy of discrete responsevariables(e.g.,ordinaldiseasedata, suchas rates,countdata, andsoon)were proposed[57–61]togetherwithBayesian genomicmodelsfor analyzingmultipletraits and multipleenvironments.AcomputationallyefﬁcientMarkovChainMonteCarlo(MCMC)method thatproducesfullconditionaldistributionsoftheparameters,leadingtoexactGibbssampling fortheposteriordistribution,hasalsobeendeveloped[9].Resultsfromsimulatedand(two) extensivedatasetsshowthat,whenthecorrelationbetweenthe traitsishigh,aproposed modelwithanunstructured covariancematrix ispreferredoverthe diagonalandstandard methodstohelpimprovethepredictionaccuracyforgrainyield.However,whencorrelations arelow,itisenoughtousethestandardmodel[9].

Depending onthe complexity of the traitand the prediction scenario, moresophisticated modelsresultinmoderate-to-highgainsinpredictionaccuracy.Inseveralstudies,complex models increased prediction accuracyby >10% at noadditional cost.The useof simple modelsmaymissimportantdatafeaturesandcauselossesofpredictionaccuracyforcomplex traits,wherenonlinearmodelsusuallygivesigniﬁcantlyhigherpredictionaccuracythanlinear models[20,36,55,56].OneoftheﬁrstassessmentsofGPinwheatbreedingwasperformedon acollectionof599 wheatlinesevaluatedinfour environmentsusingpedigreeandgenomic informationwithtwodifferentmodels[34].ThemodelswerethestandardGBLUPwiththelinear genomicmatrix Gandanonparametricmodel,ReproducingKernelHilbert Spaces(RKHS) regressionwithanonlineargenomicmatrix,theGaussiankernel(GK)[62].Themostcomplex model,RKHSusingaGKnonlinearkernel,includingpedigreeandmarkerinformation,gavethe highestpredictionaccuracy,rangingfrom15%to36%,withrespecttothepedigreemodel alone.TheGP modelRKHSwithGK,pedigree,andmarkershasbeenusedforpredicting resistancetoleaf,stem,andstriperust,septoria,tanspot,andStagonosporanodorumblotch

(7)

inwheat;comparedwithstandard least-squaresmultipleregressionmethods,RKHSgave increasesinaccuracyof42%and48%inthetworeportedstudies[63,64],respectively.

Basedontheencouragingresultsobtainedinsomemajorcereals,preliminarystepshavebeen takentodeployGStodevelopsuperiorlinesmorequicklyandenhancetherateofgeneticgain inafewlegumecrops,suchaspea,soybean,chickpea,groundnut,andpigeonpea[65].The recentavailabilityofcost-effective,high-throughputsequencinghasfacilitatedthe develop-mentoflarge-scalegenomicresourcesinmostlegumes.Soybeanwastheﬁrstlegumecrop where GS was deployed for improving yield and agronomictraits using genotyping-by-sequencing(GBS)inabreedingprogram[25].ToassesstheutilityofGSinsoybeanbreeding programsandunderstandtheeffectofmarkerselectionandgenotypeimputation,prediction accuracieswerecalculatedfortwogenomicpredictionmodels,namely,thestandardGBLUP modelwithadditiveeffectsandanextendedGBLUPwithadditive-by-additiveepistasis.High predictionaccuracy(0.64)indicatedthepotentialofusingGStoimprovegrainyield[25].

Inthecaseofchickpea,acollectionof320elitebreedinglineswasgenotypedusingDiversity Arraytechnology(DArTseq)andphenotypedforyield-relatedtraitsintwoenvironmentswith twodifferenttreatments(i.e.,rainfedandirrigated)intwodifferentseasons[65,66].Various statistical models (RR-BLUP, Kinship GAUSS, BayesCp,Bayes B, Baysian LASSO, and randomforestregressionorRFR)resultedinhighpredictionaccuraciesforthetraitsofinterest; however,notmuchvariationinpredictionaccuracyamongthedifferentmodelswasobserved

[65,66].Whenpopulationstructurewasincludedinthemodel,predictionaccuraciesimproved slightlyfordaystomaturity(DM),daystoﬂowering(DF),andseeddryweight(SDW),butnotfor seedyield(SY).

Ingeneral,earlystatisticalmodelsdevelopedforGPinanimalbreedingwerebasedon single-environmentassessments.However,inplantbreeding,GEinteractionsareofparamount importance.JustasGEinteractionsareafundamentalchallengeinplantbreeding,theyare alsoincreasinglyrecognizedasamajorcomplexityinGPmodels.

GPIncorporatingGenotypeEnvironmentInteraction

Includinghigh-densitymarkerplatformswithGEinteractionsincreasestheaccuracyofGP models; this has been extensively studied in bread wheat, maize, and legumes [23– 27,55,56,67].In all GP models that incorporateGE interactions, accuracywithrespect tosingle-environmentanalysesincreased10–40%onaverageinallthreecropspecies.The main models used to assess GP accuracy by incorporating GE interactions and their applicationtorealdataaredescribedbelow.

MultienvironmenttrialsforassessingGEinteractionshaveanimportantroleinplantbreeding forselectinghigh-performingandstablelinesacrossenvironments.Burgueñoetal.[23]were theﬁrsttousemarker-andpedigree-basedGBLUPmodelsforassessingGEinteractions undergenomicprediction,whileHeslotetal.[52]incorporatedcrop-modelingdatatostudy genomicGEinteractions.Areactionnormmodel,wherethemainandinteractioneffectsof markersandenvironmentalcovariatesare introducedusing high-dimensionalrandom vari-ance-covariance structures of markers and environmental covariates, was developed by Jarquínetal.[24]asanextensionofthewell-knownGBLUPmodel.

Thebaselinemodelforphenotypesevaluatedindifferentenvironments(yij)canbedescribed

usingEquation1:

(8)

wheremistheoverallmean,Ei(i=1,...,I)istherandomeffectoftheithenvironment,Ljisthe

randomeffectofthejth_line_(j=1,_._._._,J),_EL

ijistheinteractionbetweentheithenvironmentand

thejth line,andeijistherandomerror term.Theassumptionsare asfollows:EiiidN 0;s2E

, Lj iid N 0;s2 L ,ELij iid N 0;s2 EL ,andeij iid N 0;s2 e

,withN(.,.)denotinganormaldistribution,and ‘iid’ standing for independent and identically distributed. However, when the number of environmentsissmall,it maybebettertoassumeit asaﬁxedeffect.

MarkerscanbeintroducedinEquation1suchthattheeffectoftheline(Lj)canbereplacedbygj

expressedasalinearregressiononmarkercovariates(itapproximatesthegeneticvalueofthe jthline).Thevectorcontainingthe‘genomicvalues’isgN 0;Gs2

g

,wheres2

gisthegenomic

variance,andGisagenomicrelationshipmatrix.Also,theeffectoftheline(Lj)canbereplaced

byaj,withaN 0;As2a

,whereAisthenumericaladditiverelationshipmatrixderivedfrom pedigree,ands2

aistheadditivevariance.TheinteractioncovariancematrixistheHadamard

productoftwocovariancestructures,onedescribingrelationships betweenlinesbasedon geneticinformation(pedigreeorgenomic)andtheotherrelatingenvironmentsbymeansof environmentalcovariates.Whentheenvironmentaleffectisassumedasfixed,theinteraction termistheHadamardproductofthefixedeffectofenvironmentsandthecovariancematrixof linesthatwasbuiltwiththegeneticinformation.Whenenvironmentalcovariablesareused,the namedreactionnormisjustifiedbecausethegenotypiceffectisareactiontothose environ-mentalcovariables,whereas,whenenvironmentalcovariablesarenotused,thereactionnorm modelshaveunknownenvironmentaldeviations.

Thisreactionnormmodel[24]hasbeenappliedsuccessfully usingpedigreeandmolecular markersinmultienvironmentsaddingenvironmentalcovariates,forexample,incottontrials withenvironmentalcovariates[68],inGPofextensivewheatgenebankaccessions[69],inGP of Fe and Zn in wheat grain [70], in GP of bread wheat lines in sites locatedin diverse agroecologicalzones[27,25],inGPpredictionofwheatlinesevaluatedinMexicoandpredicted inlocationsinSouthAsia[71],andinGPofextensiveﬁeldtrialsinwheatondifferentcontinents

[67].ThereactionnormmodelwasalsoextendedtoGEinteractionswithmaizeandwheat diseaseordinalandcountdatabyMontesinos-Lópezetal.[57–60](see‘Bayesian genomic-enabledpredictionmodelsforordinalandcountdataincorporatinggenotypeenvironment interaction’ inthe supplementary information online). The increase in GP accuracy of the reactionnormmodelwithGEinteractionswasonaverage7–20%relativetotheprediction accuracyoftheGBLUPwithoutincludingGEinteractions.

InareassuchasKansas,USA,wheatproductionisimpactedbyyearlyclimatefactors,suchas extremetemperaturesanderraticprecipitation.Yearlyeffectsarenotrepeatableandrepresent thedynamicpartoftheGEinteractions,whereassiteeffectsrepresentthestaticrepeatable component.The accuracyof predicting unobserved historical sites in the wheat breeding programofKansasStateUniversityreaches0.54,butunobservedyearscanbepredictedwith only0.17accuracy[26].Resultsofusingthereactionnormmodelwithpedigreeandgenomic informationforpredicting400–500breadwheatlinesinSouthAsianlocationsusing approxi-mately 60000 lines trained in different Mexican environments indicated that GP is more accurate (0.35) than PS for predicting unobserved wheat lines in different South Asian locations(0.20)[71];perhaps thislastresultisthesimplestproofofthe conceptthatGP worksbetterthanPS.

GPIncorporatingMarkerEnvironmentInteractionModels

TheGEinteraction modeldescribedby López-Cruzetal. [53] decomposes themarker effectsintocomponentsthatarecommonacrossenvironments(stability)and environment-speciﬁc deviations(interaction) [see‘The GE (orME) model withlinear kernel’inthe supplementaryinformationonline].Thismodelborrowsinformationfromacrossenvironments

(9)

whileallowingmarker effectsto changeineach environment.Itcanbeimplemented using shrinkagemethodsaswellasvariableselectionmethodsand,thus,canbeusedtoidentify genomicregionswhose effectsare stableacrossenvironmentsandotherregions thatare responsibleforGEinteractions[54].TheGEinteractionmodelofLópez-Cruzetal.[53]is bestsuitedforthejointanalysisofpositivelycorrelatedenvironmentsandwasusedtoanalyze threeCIMMYTwheatdatasets.ThepredictionaccuracyoftheGEinteractionmodelwas greaterthanacross-environmentorsingle-environmentanalyses(5–29%whenpredictingeach ofthe environments).Recently,this genomicGEinteraction modelwas usedto predict untesteddurumwheatlinesinenvironments,aswellasavariableselectionmodeltoidentify genomicregionswhoseeffectsarestableacrossenvironmentsandothersthatare environ-mentspeciﬁc[54].

InthemodelsofJarquínetal.[24]andLópez-Cruzetal.[53],thekernelusedisthelinearkernel GBLUP.Inanewstudy,Cuevasetal.[55]proposedaGEinteractionmodelsimilartothatof López-Cruzetal.[53]butwithanonlinearkernel,theGaussiankernel(GK),whichissimilartothat usedintheRKHS [62][see‘TheGEmodelofLópez-Cruzetal.(2015)withanon-linear Gaussiankernelmethod’inthesupplementaryinformationonline].Usingtwoextensivedatasets, theauthorsfoundthat,forthewheatdatasets,theGKgavepredictionaccuraciesupto17% higherthantheGBLUPlinearkernel.Forthemaizedataset,theGKwasonaverage5–6%higher thantheGBLUPlinearkernel.TheadvantageoftheGKovertheGBLUPisthatitisamoreﬂexible kernelthataccountsforsmallandcomplexmarkermaineffectsandspeciﬁcinteractions. OneweaknessofbothGEinteractionmodelswithalinear[53]andnonlinearkernel[55]is thatpositivecorrelationsbetweenenvironmentsareassumed.However,whenthecorrelation between environments is low or negative, these models do not increase the prediction accuracyofenvironmentswithnegligibleornegativecorrelations.ThestrengthoftheBayesian modelswithGEinteractionsusedunderthe linearkernelGBLUPorunderthenonlinear Gaussian kernel (GK) is that they overcome the limitation of the previous models when associations betweenenvironments arenegligible ornegative, as shown by Cuevas etal.

[56].TheseauthorsproposedconsideringthegeneticeffectsðuÞdescribedbytheKronecker productofvariance-covariancematricesofgeneticcorrelationsbetweenenvironmentsand genomic kernels through markers under two linear kernel methods: linear (GBLUP) and Gaussian(GK).Anextension includesthesamegeneticcomponent asthefirstmodelðuÞ, plusanextraresidualgeneticcomponent,f,whichcapturesrandomeffectsbetween environ-mentsthatwerenotaccountedforbytherandomeffectsu(see‘Multienvironment genoty-peenvironment interactionmodelwithlinear andnonlinearkernels’inthesupplementary informationonline).Resultsoftheanalysesoffivedatasetsshowedthat:(i)GEinteraction modelsalwayshadsignificantlyhigherpredictionaccuracythansingle-environmentmodels; and(ii)thepredictionaccuracyofGEmodelswithuandfoverthemultienvironmentmodel withonlyuwashigher85%ofthetimewithGBLUPand45%ofthetimewithGKacrossthe five data sets.Results indicated that including the random effect f was still beneficial for increasingGP accuracyafteradjusting fortherandomeffectu.Predictionaccuracyofthe GEinteractionmodelmethods(GBLUPorGK)increasedupto85%overtheaccuracyof thesingle-environmentmodel.

MachineLearningforGenomicPrediction

InanappliedGScontext,thefocusshouldnotbeonpredictingallindividuals,butratheron classifyingindividualsintoupper,middle,orlowerclasses,dependingonthetraitunderselection. UsingclassiﬁersinGSisattractivebecausetheyaretrainedtomaximizetheprobabilityofan individualbeingamember ofthetargetclass,ratherthansearchingforitsoverall performance[21]. González-Camachoetal.[18]usedaradialbasisneuralnetworkonanextensivegenomicmaize datasetcomprisingseveraltraitsindifferentenvironmentsandcomparedtheresultswithRKHS

(10)

andwithalinearregressionmodel.Theaccuracyofneuralnetworkmethodswassimilartothatof RKHSandslightlyhigherthanthatofthelinearregression.

Inarecentstudy,twoneuralnetworkclassifiers(amultilayerperceptron,MLP,anda probabi-listic neural network, PNN) were compared for predicting the probability of an individual memberofatargetphenotypicclass,using33maizeandwheatgenomicandphenotypic datasets[19].Theauthorsfocusedonthe15thand30thpercentilesoftheupperandlower classestoselectthebestindividuals,ascommonlydoneinGS(fortraitssuchasgrainyield,the upperclassesarethetarget;fordiseases,thefocusisonthelowerclasses).Thecriterionfor assessing the prediction accuracy of MLP and PNN was the area under the receiver operatingcharacteristiccurve(AUC). Theparametersofbothclassifierswereestimated byoptimizingtheAUCforaspecifictargetclass.ThePNNwasfoundtobemoreaccuratethan theclassifierMLPforassigningmaizeandwheatlinestothecorrectupper,middle,orlower class.Resultsforthewheatdatasetwithcontinuoustraitssplitintotwo andthree classes showedthattheperformanceofPNNwiththreeclasseswasbetterthanPNNwithtwoclasses whenclassifyingindividualsintotheupperandlower(15%or30%)categories.Dependingon themaizetrait–environmentcombination,AUCforPNN30%orforPNN15%uppertrait(grain yield)washigherthantheAUCofMLP.Forthelowerclass,flowering(maleandfemale)traits, PNN15%andPNN30%alwayshadbetterAUCthanMLP15%andMLP30%[19].

GeneticGainsfromRapidSelectionCycleGS:CIMMYTMaizeBiparentalPopulations

ThefundamentalgoalofusingGSinbreedingistoachievegreatergeneticgainsatalowercost andinlesstimethanwithconventionalpedigreebreeding.Toachieveashorterintervalcycle,a favorableuseofGSispredictionwithinfull-sibfamilies,becausebiparentalpopulationshave highLDbetweenmarkerallelesandtraitalleleswithnogroupstructure.

TherearefewstudiesthatmeasurethegeneticgainsachievedthroughuseofaGS-based rapidselectioncycle.ThefirststudyconfirmingthepromiseofarapidselectioncycleinGPof biparentalpopulations,aswellaspreviousfindingsfromrandomcross-validationstudies,was conductedbyMassmanetal.[44]andshowedthatGSimprovedmaizegeneticgainsperunit oftime.GeneticgainswerealsoreportedbyAsoroetal.[45]inoatandbyRutkoskietal.[48]in wheat,whichshowedthatGSandPSprovidedsimilarrealizedgeneticgainsperunitoftime.

GeneticgainstudiescomparingGScyclesC0,C1,C2,andC3withpedigreeselectiononeight

CIMMYTtropical biparental maize populations in Sub-Saharan Africa were conducted by Beyeneetal.[47] underdrought conditions.Theauthorsshowedthat:(i)theaveragegain percycle(acrossalleightbiparentalpopulations) fromGS was0.086t/haunder managed droughtconditions;(ii)theaveragegrain yieldofC3-derivedhybridswassigniﬁcantlyhigher

thanthatofhybridsderivedfromC0;and(iii)threeGScyclescanbeachievedin1year.The

authorsconcludedthathybridsderivedfromC3produced7.3%highergrainyieldthanthose

developedthroughconventionalpedigreebreeding.Bycontrast,theaveragegainpercycle usingmarker-assistedrecurrentselection(MARS)acrosstenpopulationswas0.051t/ha percycle under managed drought stress. Extensive ﬁeld trialswere conducted inseveral manageddroughtenvironmentsinSub-SaharanAfricatoevaluatethegrainyieldperformance ofmaizehybridsderivedfromGS-basedlinesselectedfromdifferentpopulations.Linesderived fromcycleC3GSinhybridcombinationproducedsigniﬁcantlyhigheraveragegrainyieldthan

linesfromC0GSinhybridcombination(Y.Beyene,2017).

Anotherexampleofgeneticgainsfromrapid-cycleGSonCIMMYTmaizeistwo biparental maizepopulations(F2:3)fromAsia(CAP1andCAP2)thatweredevelopedandevaluatedfor

testcrossperformanceunderdroughtandoptimalconditions[72].Thegeneticgainsperyear forPSversusGSindroughtenvironmentswere0.067t/haversus0.124t/ha,respectively,for

(11)

CAP1,and0.076t/haversus0.104t/ha,respectively,forCAP2.Thecorrespondinggenetic gainsperyearforPSversusGSinoptimalenvironmentswere0.084t/haversus0.140t/ha, respectively,forCAP1,and0.123t/haversus0.13t/ha,respectively,forCAP2.Resultsofthis studyconﬁrmedthatGSofsuperiorplantphenotypesproducedrapidgeneticgainsindrought toleranceinmaize.

GeneticGainsfromRapidSelectionCycleGS:CIMMYTMultiparentalPopulations

Most GS results in maize have been achievedby rapid cycling of biparental populations (e.g.,F2:3segregatingpopulationscrossedwithatesterfromtheoppositeheteroticgroup).

Fiveyearsago,theGlobalMaizeProgramofCIMMYTdesignedaGSrapidcycleof multi-parentalcrosses.Fifteenelitetropicalmaizelineswerecrossedindiallelicfashiontoformcycle 0(C0),whichwasgenotypedusingGBSmarkersandphenotypedattwolocationsinMexico;

plantswiththebestphenotypewereselectedtoformtheparentsforGScycle1(C1).TheC1

parentswereintercrossedandtheprogenygenotypedwiththesameGBSmarkersusedfor theC0population[73].

Twocyclesperyearwerecompletedand,attheendofthesecondyear,seedsfromcyclesC0,

C1,C2,C3,andC4werecollected,assembled,andsown attwolocationsinMexico.Fifty

entries per genomic cycle were sown at each location, together with two widely used commercialtropical maizehybrids.Average genomicgrain yieldgainsreached 0.134t/ha, withC0producing6.653t/ha.GrainyieldofC1wasslightlylower(6.488),andcyclesC2,C3,

andC4producedmeanyieldsof7.022,6.879,and7.126t/ha,respectively.Therealizedgrain

yieldfromC1toC4reached0.225t/hapercycle,whichtheauthorsconsideredequivalentto

0.1t/haperyearoverabreedingperiodof4.5years.Aslightdeclineingeneticdiversitywas detectedatC4comparedwithC0.

ProspectsforEnhancedUseofGSinPlant Breeding

ToacceleratethedeploymentofGSincropbreedingwhilereducingthecostoflineandhybrid development,hereweexaminethecombineduseofGSwithhighthroughputphenotype(HTP) forearly-generationtestinginplantbreeding.Inaddition,weexaminetheapplicationofGSin germplasmenhancementandprebreedingusinggenebankaccessions.

CombiningMulti-TraitMultienvironmentGSwithHigh-ThroughputPhenotyping:The

CIMMYTCase

Inmodernagriculture,high-resolutioncamerasareusedtoobtainhundredsofreﬂectancedata measuredatdiscretenarrowbands(wavelengths)tocoverthewholespectrum.This informa-tionisusedtoconstructvegetationindices(e.g.,GreenNormalizedDifferenceVegetationIndex orNDVI)topredictprimarytraits(e.g.,grainyield).However,theseindicesuseonly certain bandsandarecultivarspeciﬁc;thus,theyfailtocaptureconsiderableinformationorperform robustlyforallcultivars.Theadvantageofthisimagingtechnologyisthatmassivenumbersof phenotypescanbescreenedinexpensivelyduringearly-generationtesting.

ThemainobjectiveofGSisto reducephenotypingcosts byusingmarkersandaccelerate geneticgains,whereastheaimofHTPistomeasurehigh-densityphenotypesinverylarge numbersof individualsorbreeding linesacrosstime andspace using remoteorproximal sensing.Thiscanincreaseboththeaccuracyandintensityofselectionand,subsequently,the selection response, while decreasingphenotyping costs. Themain idea of HTP is to use secondarytraitsrelatedtograinyield,diseaseresistance,orend-usequalitythatmaybeuseful inearly-generationtestingoflines.Arecentstudy[74]foundthatthehighestaccuracywhen predictinggrainyieldisachievedbytheuseofbroaderselectionofwavelengths(see‘Predicting grainyieldusingcanopyhyperspectralreﬂectanceinwheatbreedingdata’inthe supplemen-taryinformationonline).

(12)

UsingHTPplatformswithvegetationindicesaspredictortraitsinpedigreeandGSmodelscan increasethepredictionaccuracyforgrainyield[75].Predictionduringearly-generationtestingis important to enhancegenetic gains during early stages of the breedingcycle, but GS is economicallyunfeasibleatthosestagesduetothelargenumberofplantsintheﬁeld;thus, whileassessingpedigreerelationshipshastheadvantageofnotcostinganything,theirusefails toexploitMendeliansampling,incontrasttoGS.Therefore,usingmultivariatepedigree-based predictionmodelsthatincorporatesuchpredictortraitswhileusinganHTPplatformoffersa low-costsolutionforpredictinggrainyieldamongandwithinfamiliesinearly-generationtesting, asdemonstratedbyRutkoskietal.[75].Theseauthorsfoundthatwithin-environment sec-ondaryvegetativeindicesincreasedpredictionaccuraciesforgrainyieldby59%usingpedigree relationshipsandby70%usinggenomicrelationships.Theseresultsindicatethatsecondary traitsmeasuredbyHTPandusedwithpedigreecanimprovethepredictionaccuracyofprimary traits during the early stages of breeding. Reynolds and Langridge [76] found that HTP techniquesareeffectiveforevaluatinggeneticresourcesforcomplextraitexpression.

HTPplatformsareusefultomeasuresecondarytraitsthataregeneticallycorrelatedwithgrain yieldandthatcanbeincorporatedinmultivariatepedigreeandgenomicpredictionmodels, improvingindirectselectionforgrainyield.Sunetal.[77]usedastatisticalmodelthatestimated BVsofsecondarytraitstogetherwithmultivariatepedigreeandgenomicmodels;theauthors founda70%(onaverage)improvementintheaccuracyofselectionforgrainyield,including secondarytraitsinbothTRNandTSTpopulations.

Arecentstudydevelopedstatisticalmodels toassesshyperspectralwavelength environ-ment interactions in HTP, incorporating genomic and pedigree GE interactions [78]. AlthoughlittleGPaccuracywasachieved,importanthyperspectralwavelengthenvironment interactionswereobserved,demonstratingthatGScoupledwithHTPcanbeapowerfultool appliedtoearly-generationtestingofalargenumberofselectioncandidates.Thefull condi-tionaldistributionsformodelingthethree-waytraitGEinteractionmodelof Montesinos-Lópezetal.[9]canbeadaptedtoincludethehyperspectralbandsinthefunctionalregression approachrecentlydescribedbyMontesinos-Lópezetal.[74,78].

ExploringtheApplicationofGStoGeneBankAccessionsforGermplasmEnhancement

Althoughtheaccessionsstoredingenebanksrepresentarichresourceforbreeders,alleles needtobeextractedfromtheaccessionsforcultivardevelopment,whichistime-consuming and expensive [79,80]. Lengthy prebreeding programs are requiredto develop lines that combinefavorableallelesfromthegermplasmbankwithgoodagronomicperformanceand that can be used as parents in a breeding program. Based on the simulation of various prebreedingoptions, germplasm enhancement breeding programs can start directly from landracesorfromlandracescrossedwithelitetesters[81].

The GP accuracy of 8416 Mexican wheat landrace accessions and 2403 Iranian wheat landraceaccessions from the CIMMYT gene bank were examined by Crossa et al. [69]. Theauthorsmeasuredtwotraitsintwoenvironmentsandseveralhighlyheritabletraitsina singleoptimumenvironment.TheGPaccuracyforseveraltraits(maturity,qualitytraits,and grainyieldandyieldcomponents)underthedifferentpredictionscenarioswashigh,ranging from0.5to0.7.Insoybean,Jarquínetal.[25]analyzedtheUSDAsoybeancollectionusing differentcross-validationschemesandgroupingfactors(trials,states,andgenetic subpopu-lations); results showed relatively high prediction accuracies that should help breeders to introgressusefulgeneticvariation(Box1).

ThesepreliminaryresultsoftheaccuracyofGPofgenebankaccessionsfavortheideaof applyingGStointrogresslandraceaccessionsinelitegermplasmandformgenepoolsand

(13)

populations suitable for prebreeding and germplasm enhancement programs. However, furtherresearchisrequiredinthisarea.

ConcludingRemarks

Manystatistical methodshave beendevelopedto predictunobservedindividualsinGS.In general,linearmodels(e.g.,GBLUP)andmachine-learningalgorithmshavebeensuccessfulin recognizing complexpatterns andmaking correctdecisions based ondata. Kernel-based methods,suchastheRKHS,haveextensivelydeliveredgoodgenomicpredictionsinplants. SeveralstatisticalmodelsbasedonthestandardGBLUPthatincorporateGEinteractionsin genomicand pedigreepredictions have providedsubstantial increases inthe accuracyof predicting unobserved individuals in environments. These GS prediction models can help scientistsindifferentdisciplinesto developdrought- andheat-tolerant plantsby exploiting positiveGEinteractions.Modelingmulti-traitmulti-environmentisessentialforimprovingthe predictionaccuracyoftheperformanceofnewlydevelopedlinesinfutureyears.

TheuseofstatisticalmodelsinextensivehyperspectralimagetechnologyforHTP,together withgenomicand pedigreeinformation inearly-generationtesting,offers anopportunityto accelerate genetic gains by increasing the intensity of selection. Deep machine-learning methodsusing neuralnetworking appearpromisingto increase the accuracyof genomic-enabledprediction.Genomicselectionhasaclear-cutadvantageoverpedigreebreedingand MAStoenhancegeneticgainsforcomplextraits.Theappropriateuseofgenotypingplatforms combinedwithprecisephenotypingplatformswillalsohelpenhancepredictionaccuracyand accelerategenetic gainsbyshortening thebreeding cycle. Further researchis requiredto incorporateGSwithHTPasaroutinecomponentinplantbreedingprograms.

Developing GP models for gene bankaccessionswill be important to access unexplored diversityandfast-trackusefulportionsintobreedingprograms(alsoseeOutstanding Ques-tions).Currently,GSisthemost-promisingbreedingmethodtospeedthedevelopmentand releaseofnewgenotypes;therefore,theuseofGStoformgenepoolsandpopulationsfrom richgenebankaccessionsmeritsextensiveandintensivestudy,especiallygiventhe vulnera-bilityofelitelinesandhybridstosevereclimatechangeeffects.

Box1.AnExampleofGSDeploymentinPrebreeding

ResultsfromusingalargenumberofMexicanandIranianlandracesstoredinthewheatgenebankofCIMMYTindicated thatpredictionaccuraciesfortraitsevaluatedinoneenvironmentforTRN20-TST80designrangedfrom0.407to0.677 forMexicanlandracesandfrom0.166to0.662forIranianlandraces.Also,predictionaccuraciesofthe20%coreset weresimilartothoseobtainedforTRN20-TST80,rangingfrom0.412to0.654forMexicanlandracesandfrom0.182to 0.647forIranianlandraces.Interestingly,thecorrelationsforcomplexGYSMtraitswereapproximately0.4forboth MexicanandIranianlandraces.ResultsofcorrelationswhenincorporatingGEinteractionsfordaystoheadingand daystomaturityforTRN20-TST80wereapproximately0.61forMexicanlandracesand0.60forIranianlandraces.The 20%coresethadcorrelationsofapproximately0.58forMexicanlandracesand0.50–0.59forIranianlandraces.

Predictionaccuraciesofthe10%and20%diversityandpredictivecoresubsetsweregenerallyofamagnitudethat appearedusefulforpredictingthevalueofgenebankaccessionsandforbreeding.Theﬁrstapplicationwouldbeto predictthegeneticvalueofallgenotypedaccessionsinagenebankandthenphenotypeonlythosethathavethe highestpredictedgeneticvalues.Then,abreedercouldbeginprebreedingfollowingseveralstrategies.Forexample, onedecisionwouldbe toinitiate a prebreedingconversionapproachbycrossingthe bestaccessionsamong themselvestoimprovetheaccessionsperseuntiltheybecomeelite.Anotherstrategycouldbetostartanintrogression approachbycrossingselectedaccessionstoelitematerials.

Nevertheless,applicationofGSforgermplasmenhancementwillhavetobeperformedincombinationwithstandard introgressionofexotictoelitegermplasmand,possibly,aseriesofbackcrossestotheelitematerial.Extractingthebest accessionsdirectlyfromthegenebanksandformingproductivegenepoolsmaybetheﬁrststagebeforereﬁningthe genepoolsandevaluatingthemunderdifferentenvironmentalconditions.

OutstandingQuestions

HowcanGShaveanimportantrolein enhancingtherateofgeneticgain? Are currently available GS models capableofprovidingthemost accu-ratepredictions?

Howcan GS acceleratethe ﬂowof favorableallelesdirectlyfromthegene banktobreedingprograms? Are weready todeployGS in pre-breeding and germplasm-enhance-mentprograms?

IsitpossibletocombineGSand pedi-greeselectionwithHTPtoaccelerate genetic gains and save resources when developing lines during early-generationtesting?

(14)

Acknowledgments

Theauthorsaregratefultotheircolleagues,collaborators,andbyusingﬁeldandlabtechnicianfromCIMMYTand ICRISAT,aswellasscientistsinNationalProgramswhocollectedthevaluabledatausedinthevariousstudies.The authorswouldliketothankKevinPixley,DanielGianola,andfouranonymousreviewersfortheirpositive,careful,and detailedreviewsofthemanuscript;theircomments,editing,andsuggestionssigniﬁcantlyimprovedthequalityand readabilityofthemanuscript.TheauthorsalsothankMichaelListmanforeditingtherevisedversionofthemanuscript.

SupplementalInformation

Supplementalinformationassociatedwiththisarticlecanbefound,intheonlineversion,athttp://dx.doi.org/10.1016/j. tplants.2017.08.011.

References

1. Bernardo,R.(2008)Molecularmarkersandselectionforcomplex traitsinplants:learningfromthelast20years.CropSci.48, 1649–1664

2. Bernardo,R.(2016)BandwagonsI,too,haveknown.Theor. Appl.Genet.129,2323–2332

3. Meuwissen,T.H.E.etal.(2001)Predictionoftotalgeneticvalue usinggenome-widedensemarkermaps.Genetics157, 1819–1829

4. Pszczola,M.etal.(2012)Reliabilityofdirectgenomicvaluesfor animalswithdifferentrelationshipswithinandtothereference population.J.DairySci.95,389–400

5. Daetwyler,H.D.etal.(2010)Theimpactofgeneticarchitectureon genome-wideevaluationmethods.Genetics185,1021–1031

6. Isidro,J.etal.(2015)Trainingsetoptimizationunderpopulation structureingenomicselection.Theor.Appl.Genet.2015,145–158

7. Lorenz,A.J.etal.(2012)Potentialandoptimizationofgenomic selectionforFusariumheadblightresistanceinsix-rowbarley. CropSci.52,1609–1621

8. Arruda,M.P.etal.(2015)Genomicselectionforpredicting fusar-iumheadblightresistanceinawheatbreedingprogram.Plant Genome8,1–12

9. Montesinos-López,O.A.etal.(2016)AgenomicBayesian multi-traitandmulti-environmentmodel.G36,2725–2744

10.Poland,J.andRutkoski,J.(2016)Advancesandchallengesin genomicselectionfordiseaseresistance.Annu.Rev. Phytopa-thol.54,79–98

11.Bellman,R.E.(1961)AdaptiveControlProcesses:AGuidedTour, PrincetonUniversityPress

12.Cuevas,J.etal.(2014)Bayesiangenomic-enabledpredictionas aninverseproblem.G34,1191–2001

13.delosCampos,G.etal.(2012)Wholegenomeregressionand predictionmethodsappliedtoplantandanimalbreeding. Genet-ics193,327–345

14.Cerón-Rojas,J.J.etal.(2015)Agenomicselectionindexapplied tosimulatedandrealdata.G35,2155–2164

15.LeCun,Y.etal.(2015)Deeplearning.Nature521,436–444

16.Gianola,D.etal.(2006)Genomic-assistedpredictionofgenetic valueswithasemi-parametricprocedure.Genetics173, 1761–1776

17.Gianola,D.etal.(2011)Predictingcomplexquantitativetraitswith Bayesianneuralnetworks:acasestudywithJerseycowsand wheat.BMCGenet.12,87

18.González-Camacho,J.M.etal.(2012)Genome-enabled predic-tionofgeneticvaluesusingRadialBasisFunctionNeural Net-works.Theor.Appl.Genet.125,759–771

19.González-Camacho,J.M.etal.(2016)Genome-enabled predic-tionusingprobabilisticneuralnetworkclassiﬁers.BMCGenomics 17,208

20.Pérez-Rodríguez,P.etal.(2012)Acomparisonbetweenlinear andnon-parametricregressionmodelsforgenome-enabled pre-dictioninwheat.G3:Genes|Genomes|Genetics2,1595–1605

21.Ornella,L.etal.(2014)Genomic-enabledpredictionwithclassi ﬁ-cationalgorithms.Heredity112,616–626

22.Gonzalez-Recio,O.etal.(2014)Machinelearningmethodsand predictiveabilitymetricsforgenome-widepredictionofcomplex traits.Livest.Sci.166,217–231

23.Burgueño,J.etal.(2012)Genomicpredictionofbreedingvalues whenmodelinggenotypeenvironmentinteractionusing pedi-greeanddensemolecularmarkers.CropSci.52,707–719

24.Jarquín,D.etal.(2014)Areactionnormmodelforgenomic selectionusinghigh-dimensional genomicandenvironmental data.Theor.Appl.Genet.127,595–607

25.Jarquín,D.etal.(2016)Genotypingbysequencingforgenomic predictioninasoybeanbreedingpopulation.BMCGenomics15, 740

26.Jarquín,D.etal.(2017)Increasinggenomic-enabledprediction

accuracybymodeling genotypeenvironment interactionin

Kansaswheat.PlantGenomePublishedonlineJune8,2017.

http://dx.doi.org/10.3835/plantgenome2016.12.0130

27.Saint-Pierre,C.etal.(2016)Genomicpredictionmodelsforgrain yieldofspringbreadwheatindiverseagro-ecologicalzones.Sci. Rep.6,27312

28.Bernardo,R.andYu,J.M.(2007)Prospectsforgenome-wide selectionforquantitativetraitsinmaize.CropSci.47,1082–1090

29.Lorenzana,R.E.andBernardo,R.(2009)Accuracyofgenotypic valuepredictionsformarker-basedselectioninbiparentalplant populations.Theor.Appl.Genet.120,151–161

30.Heffner,E.L.etal.(2009)Genomicselectionforcrop improve-ment.CropSci.49,1–12

31.Heffner,E.L.etal.(2010)Plantbreedingwithgenomicselection: gainperunittimeandcost.CropSci.50,1681–1690

32.delosCampos,G.etal.(2009)Predictingquantitativetraitswith regressionmodelsfordensemolecularmarkersandpedigree. Genetics182,375–385

33.delosCampos,G.et al.(2010)Semi-parametric genomic-enabledpredictionofgeneticvaluesusingreproducingkernel Hilbertspacesmethods.Genet.Res.92,295–308

34.Crossa,J.etal.(2010)Predictionofgeneticvaluesofquantitative traitsinplantbreedingusingpedigreeandmolecularmarkers. Genetics186,713–724

35.Crossa,J.etal.(2011)Genomicselectionandpredictioninplant breeding.J.CropImprov.25, 239–226

36.Crossa,J.etal.(2013)Genomicpredictioninmaizebreeding populations with genotyping-by-sequencing. G3: Genes| Genomes|Genetics3,1903–1926

37.Crossa,J.etal.(2014)GenomicpredictioninCIMMYTmaizeand wheatbreedingprograms.Heredity112,48–60

38.Pérez-Rodríguez,P.anddelosCampos,G.(2014) Genome-wideregressionandpredictionwiththeBGLRstatistical pack-age.Genetics198,483–495

39.Hickey,J.M.etal.(2012)Factorsaffectingtheaccuracyof geno-typeimputationinpopulationsfromseveralmaizebreeding pro-grams.CropSci.52,654–663

40.Riedelsheimer,C.etal.(2012)Genomicandmetabolicprediction ofcomplexheterotictraitsinhybridmaize.Nat.Genet.44, 217–220

41.Zhao,Y.etal.(2012)AccuracyofgenomicselectioninEuropean maizeelitebreedingpopulations.Theor.Appl.Genet.124, 769–776

42.Windhausen,V.S.etal.(2012)Effectivenessofgenomic predic-tionofmaizehybridperformanceindifferentbreedingpopulations andenvironments.G3:Genes|Genomes|Genetics2,1427–1436

(15)

43.Technow,F.etal.(2013)Genomicpredictionofnortherncornleaf blightresistanceinmaizewithcombinedorseparatedtraining setsforheteroticgroups.G3:Genes|Genomes|Genetics3, 197–203

44.Massman,J.M.et al.(2013) Genome-wideselectionversus marker-assistedrecurrentselectiontoimprovegrainyieldand stover-qualitytraitsforcellulosicethanolinmaize.CropSci.53, 58–66

45.Asoro,F.G.etal.(2013)Genomic,marker-assisted,and pedi-gree-BLUPselectionmethodsforb-glucanconcentrationinelite oat.CropSci.53,1894–1906

46.Combs,E.andBernardo,R.(2013)Genome-wideselectionto introgress semidwarf corn germplasm into U.S. Corn Belt inbreds.CropSci.53,1427–1436

47.Beyene,Y.etal.(2015)Geneticgainsingrainyieldthrough genomicselectionineightbi-parentalmaizepopulationsunder droughtstress.CropSci.55,154c163

48.Rutkoski,J.etal.(2015)Geneticgainfromphenotypicand genomicselectionforquantitativeresistancetostemrustof wheat.PlantGenome8,1–10

49.VanRaden,P.M.(2007)Genomicmeasuresofrelationshipand inbreeding.InterbullAnnu.Meet.Proc.37,33–36

50.VanRaden,P.M.(2008)Efﬁcientmethodstocomputegenomic predictions.J.DairySci.91,4414–4423

51.Heslot,N.etal.(2012)Genomicselectioninplantbreeding:a comparisonofmodels.CropSci.52,146–160

52.Heslot,N.etal.(2014)Integratingenvironmentalcovariatesand cropmodelingintothegenomicselectionframeworktopredict genotypebyenvironmentinteractions.Theor.Appl.Genet.127, 463–480

53.López-Cruz,M.etal.(2015)Increasedpredictionaccuracyin wheatbreedingtrialsusingamarkerenvironmentinteraction genomicselectionmodel.G3:Genes|Genomes|Genetics5, 569–582

54.Crossa,J.etal.(2016)ExtendingtheMarkerEnvironment interaction model for genomic-enabled prediction and genome-wideassociationanalysesindurumwheat.CropSci. 56,1–17

55.Cuevas, J. et al.(2016) Genomic prediction of genotype

-environment interaction kernel regression models. Plant

GenomePublishedonlineSeptember22,2016.http://dx.doi.

org/10.3835/plantgenome2016.03.0024

56.Cuevas,J.etal.(2017)Bayesiangenomicpredictionwith gen-otypeenvironment interaction kernel models. G3: Genes| Genomes|Genetics7,41–53

57.Montesinos-López, O.A.et al.(2015) Threshold modelsfor genome-enabledpredictionofordinalcategoricaltraitsinplant breeding.G3:Genes|Genomes|Genetics5,291–300

58.Montesinos-López,O.A.etal.(2015)Genomic-enabled predic-tionofordinaldatawithBayesianlogisticordinalregression.G3: Genes|Genomes|Genetics5,2113–2126

59.Montesinos-López,O.A.etal.(2015)Genomicpredictionmodels forcountdata.J.Agric.Biol.Environ.Stat.20,533–554

60.Montesinos-López,O.A.etal.(2016)GenomicBayesian predic-tionmodelforcountdatawithgenotypeenvironment interac-tion.G3:Genes|Genomes|Genetics6,1165–1177

61.Montesinos-López,O.A.etal.(2017)ABayesian poisson-log-normalmodelforcountdataformultiple-trait multiple-environ-ment genomic-enabled prediction. G3: Genes|Genomes| Genetics7,1595–1606

62.Gianola,D.andvanKaam,J.B.C.H.M.(2008)Reproducing ker-nelhilbertspaceregressionmethodsforgenomic-assisted pre-dictionofquantitativetraits.Genetics178,2289–2303

63.Philomin,J.etal.(2017)Comparisonofmodelsand whole-genomeproﬁlingapproachesforgenomic-enabledprediction ofSeptoriatriticiblotch,Stagonosporanodorumblotch,and tanspotresistanceinwheat.PlantGenome10,1–16

64.Philomin,J.etal.(2017)Genomicandpedigree-basedprediction forleaf,stem,andstriperustresistanceinwheat.Theor.Appl. Genet.130,1415–1430

65.Varshney,R.K.(2016)Excitingjourneyof10yearsfromgenomes toﬁeldsandmarkets: some successstories of genomics-assistedbreedinginchickpea,pigeonpeaandgroundnut.Plant Sci.242,98–107

66.Roorkiwal,M.etal.(2016)Genome-enabledpredictionmodels foryieldrelatedtraitsinchickpea.Front.PlantSci.7,1666

67.Sukumaran,S.etal.(2017)Genomicpredictionwithpedigree andgenotypeenvironmentinteractioninspringwheatgrownin SouthandWestAsia,NorthAfrica,andMexico.G3:Genes| Genomes|Genetics7, 481–197

68.PérezRodríguez,P.etal.(2015)Apedigreereactionnormmodel forpredictionofcotton(Gossypiumsp.)yieldinmulti-environment trials.CropSci.55,1143–1151

69.Crossa,J.etal.(2016)Genomicpredictionofgenebankwheat landraces.G3:Genes|Genomes|Genetics6,1819–1834

70.Govidan,V.etal.(2016)Genomicpredictionforgrainzincand ironconcentrationsinspringwheat.Theor.Appl.Genet.129, 1595–1605

71.Pérez-Rodríguez,P.etal.(2017)Single-stepgenomicand

pedi-greegenotypeenvironmentinteractionmodelsforpredicting

wheatlinesininternationalenvironments.PlantGenome

Pub-lished online May 5, 2017. http://dx.doi.org/10.3835/

plantgenome2016.09.0089

72.Vivek,B.S.etal.(2017)Useofgenomicestimatedbreeding valuesresultsinrapidgeneticgainsfordroughttolerancein maize.PlantGenome10,1–8

73.Zhang,X.etal.(2017)Rapidcyclinggenomicselectionina multiparentaltropicalmaizepopulation.G3:Genes|Genomes| Genetics7,1–12

74.Montesinos-López,O.A.etal.(2017)Predictinggrainyieldusing canopyhyperspectralreﬂectanceinwheatbreedingdata.Plant Methods13,4

75.Rutkoski,J.etal.(2016)Predictortraitsfromhigh-throughput phenotypingimproveaccuracyofpedigreeandgenomic selec-tionforgrainyieldinwheat.G36,2799–2808

76.Reynolds,M.andLangridge,P.(2016)Physiologicalbreeding. Curr.Opin.PlantBiol.31,162–171

77.Sun,J.etal.(2017)Multi-trait,randomregression,orsimple

repeatabilitymodelinhigh-throughputphenotypingdataimprove

genomicpredictionforgrainyieldinwheat.PlantGenome

Pub-lished online Mary 18, 2017. http://dx.doi.org/10.3835/

plantgenome2016.11.011

78.Montesinos-López,A.etal.(2017)GenomicBayesianfunctional regressionmodelswithinteractionsforpredictingwheatgrain yieldusinghyper-spectralimagedata.PlantMethods62,13

79.McCouch,S.etal.(2012)Genomicsofgenebank:acasestudy forrice.Am.J.Bot.99,407–423

80.Yu,X.etal.(2016)Genomicpredictioncontributingtoapromising globalstrategytoturbochargegenebanks.Nat.Plants2,16150

81.Gorjanc,G.etal.(2016)Initiatingmaizepre-breedingprograms usinggenomicselectiontoharnesspolygenicvariationfrom landracepopulations.BMCGenomics17,30