• No results found

Genomic Selection in Plant Breeding: Methods, Models, and Perspectives

N/A
N/A
Protected

Academic year: 2021

Share "Genomic Selection in Plant Breeding: Methods, Models, and Perspectives"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Feature

Review

Genomic

Selection

in

Plant

Breeding:

Methods,

Models,

and

Perspectives

José

Crossa,

1,

* Paulino

Pérez-Rodríguez,

2

Jaime

Cuevas,

3

Osval

Montesinos-López,

4

Diego

Jarquín,

5

Gustavo

de

los

Campos,

6

Juan

Burgueño,

1

Juan

M.

González-Camacho,

2

Sergio

Pérez-Elizalde,

2

Yoseph

Beyene,

1

Susanne

Dreisigacker,

1

Ravi

Singh,

1

Xuecai

Zhang,

1

Manje

Gowda,

1

Manish

Roorkiwal,

7

Jessica

Rutkoski,

8

and

Rajeev

K.

Varshney

7,

*

Genomic selection (GS) facilitates the rapid selection ofsuperior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles,andbasisofGSandgenomic-enabledprediction(GP)aswellasthe geneticsandstatistical complexitiesofGPmodels,includinggenomic geno-typeenvironment(GE)interactions.WealsoexaminetheaccuracyofGP modelsand methods fortwo cereal crops and two legumecrops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement(i.e.,prebreeding)programscould acceleratetheflowofgenes fromgene bank accessionsto elitelines. Recent advances in hyperspectral imagetechnologycouldbecombinedwithGSandpedigree-assistedbreeding. TheRole ofGenomic-Enabled PredictioninPlant Breeding

Beginningduringthe1980s,thedevelopmentofdifferentmolecularmarkersystemsdrastically increased the total number of polymorphic markers available to plant breeders, and to molecularbiologistsingeneral.Themostnotablehigh-throughputgenotyping(HTG)system issinglenucleotidepolymorphisms(SNPs),whichhavebeenusedintensivelyinquantitative traitlocus(QTL;see Glossary)discovery.More than 10000 QTLs usingdifferentmarker systemshavebeenreportedinmorethan120studiescovering12plantspecies[1]thataimed toimprovequantitativetraitsofeconomicimportance.Initially,molecularmarkerswere inte-gratedintraditionalphenotypicselection(PS)byapplyingmarker-assistedselection(MAS). Forsimpletraits,MAScomprisesselectingindividualswithQTL-associatedmarkersthathave majoreffects;markersnotsignificantlyassociatedwithatraitarenotused.However,attempts toimprovecomplexquantitativetraitsbyusingQTL-associatedmarkerdetectionhavebeen unsuccessfulduetothedifficultyoffindingthesameQTLacrossmultipleenvironments(dueto QTLenvironmentinteractions)orindifferentgeneticbackgrounds[2].

LinkageanalysisforQTLmappingisdoneonbiparentalpopulations,buthaslowpowerfor detectingmarker–traitassociationduetochromosomeswithlowrecombinationrates. There-fore,associationmappingstartedduringtheearly2000swiththeobjectiveofovercomingthe lowpower oflinkage analysis,thusfacilitating thedetectionof marker–trait associations in

Trends

Inrecentyears,theglobalclimatehas changed,resultingindrastic fluctua-tionsinrainfallpatternsandincreasing temperature.Suddenclimatechanges cancausesignificanteconomiclosses tocountriesworldwide.

Geneticimprovementofseveral eco-nomicallyimportantcropsduringthe 20thcenturyusingphenotypic, pedi-gree,andperformancedatawasvery successful. However, signs of grain yieldstagnationinsomecrops, espe-ciallyindrought-stressed and semi-aridregions,areevident.

Genomicselectionoffersthe opportu-nitytoincreasegrainproductioninless time.InternationalMaize and Wheat ImprovementCenter(CIMMYT)maize breeding research in Sub-Saharan Africa,India,andMexicohasshown thatgenomicselectioncanreducethe breedingintervalcycletoatleasthalf theconventionaltime andproduces linesthat,inhybridcombinations, sig-nificantly increasegrain yield perfor-mance over that of commercial checks.

Publicandprivateinvestmentincrop genomic selection research should increasetosuccessfullydevelopinless time germplasm that is adapted to suddenclimatechange.

(2)

nonbiparentalpopulationsandfine-mappingchromosomesegmentswithhighrecombination rates.However,themainproblemoffine-associationmappingisthelowpowerfordetecting rarevariantsthatmaybeassociatedwitheconomicallyimportanttraits[2].Thus,thechallenge ofassociationmappingandQTLdetectionresidesinidentifyingandquantifyingrareQTLswith smalleffects for economicallyimportant traits thatare highly affectedby theenvironment. However,becausethecostofSNPassayshasdramaticallydecreased,thepossibilityofusing high-densitySNP arrays(tensofthousands) hasresulted inthe developmentofstatistical modelstopredictmarker–traitassociationaccurately,dependingonthegeneticarchitectureof thepredictedtrait.

Contrary to QTL and association mapping,GS uses all molecular markers for GP of the performanceofthecandidatesforselection.Therefore,theaimofGSistopredictbreeding and/orgeneticvalues.GScombinesmolecularandphenotypicdatainatrainingpopulation (TRN)toobtainthegenomicestimatedbreedingvalues(GEBVs’)ofindividualsinatesting population(TST)thathavebeengenotypedbutnotphenotyped[3].Figure1Adepictsthetwo basicpopulationsinaGSprogram:theTRNdata,whosephenotypeandgenotypeareknown, andtheTSTdata,whosegeneticvaluesaretobepredicted.GSisusedinplaceofphenotyping forafew selectioncycles.Themainadvantages ofGS overphenotype-basedselection in breedingarethatitreducesthecostpercycleandthetimerequiredforvarietydevelopment.In termsofcostreductioninmaizebreeding,thebreedercantestcross50%ofallavailablelines, evaluatingtheminfirst-stagemulti-locationaltrials,andcanthenusethephenotypicdatato predictthe remaining 50% byGS. Figure 1B showsthe advantageof GS overPS for:(i) reducingcosts,upto50%;and(ii)savingtimebyselectinglinesdirectlyforstageIIinstead goingthroughstageI(usedinPS).Thissignificantlyreducesthecostoftestcrossformationand evaluationateachstageofmulti-locationevaluations.ThetimeefficiencyoverPScouldcome fromthesecondcycleofselection,whichusestheTRNfromthepreviouscycletopredictthe new doubled haploid (DH) lines, thus excluding testcross formation and first-stage multi-locationevaluationtrials.BasedonGS,thebestlinescouldgodirectlytothesecondstage ofmulti-locationevaluations.

GSpredictsthebreedingvalues(BVs)ofthecandidatesforselection.BVshavetwo compo-nents:the parentalaverage (the mean BV of both parents) and the deviation of progeny performancefromthisaveragethatisduetoMendeliansampling.Inconventionalbreeding,the parentalaverageisquantifiedbypedigreeinformation(ifthegenealogyisavailable),fromwhich arelationshipmatrixAbetweentheindividualscanbederived.Mendeliansamplingassesses within-familyvariationthatisquantifiedbytestingtheprogenyinmultienvironmentfieldtrials. GStakesadvantageofdensemarkerstoquantifyMendeliansampling,thusavoidingtheneed toextensivelyphenotypethe progeny.Thissavestimeby reducingthe cyclelength,while enhancingtheexpectedgeneticgainandtheselectionresponseperunittime;italsouses lessresourcescomparedwithextensivephenotyping.GShasthepotentialtoquicklyimprove complextraitswithlowheritabilityaswellastosignificantlyreducethecostoflineandhybrid development.GScanalsobeusedforsimpletraitswithhigherheritabilitythancomplextraits, forwhichhighGP accuracyisexpected.TheapplicationofGS inplant breedingcouldbe limitedbytwomainfactors:(i)genotypingcosts;and(ii)unclearguidelinesastowhereGScan beefficientlyappliedinabreedingprogram.

GSandGP havebeenappliedusingtwo differentapproaches.Onefocusesonpredicting additiveeffectsinearlygenerationsofabreedingprogram(F2:3)toachievearapidselection

cyclewithashortinterval(i.e.,GSattheF2levelofabiparentalcross).Inthiscase,researchers

are interested in predicting the additive values (BVs) rather than the total genetic value; therefore, additive linear models that summarize the effects of the markers are sufficient. The other approach predicts the complete genetic values of individuals considering both

1

InternationalMaizeandWheat ImprovementCenter(CIMMYT),Apdo. Postal6-641,06600,MexicoCity, Mexico

2

ColegiodePostgraduados, Montecillo,Texcoco,56230,Edo.de Mexico,Mexico

3

UniversidaddeQuintanaRoo, QuintanaRoo,77019,Mexico

4

FacultaddeTelemática,Universidad deColima,Colima,28040,Mexico

5

DepartmentofAgronomyand Horticulture,Universityof Nebraska-Lincoln,321KeimHall,Lincoln,NE 68503-0915,USA

6

DepartmentofEpidemiology& Biostatistics,MichiganState University,909FeeRoad,Room B601,EastLansing,MI48824,USA

7

InternationalCropsResearchInstitute fortheSemi-AridTropics(ICRISAT), Patancheru502324,Telangana,India

8

InternationalRiceResearchInstitute, LosBaños,4030,Philippines

*Correspondence:

[email protected](J.Crossa)and

[email protected]

(3)

Glossary

Breedingvalue(BV):theworthof anindividualasestimatedbythe averageperformanceofits progenies.

Array:high-throughputgenotyping platformcontainingeither genome-wideSNPs(SNParray)orcoding variants(exomearray). Geneticgain:theamountof increaseinperformancethatis achievedthroughgenetic improvementprogramsbetween consecutiveselectioncycles. Genomicestimatedbreeding value(GEBV):breedingvalueofan individualcalculatedusing genome-widemarkerdatatocapturesmall geneticeffectsdispersedoverthe genome.

Genomicselection(GS):atypeof breedingmethodologythatselects thebestcandidatesasparentsfor thenextselectioncyclebasedon theirpredictedbreedingvalues computedgiven:(i)theirgenotypes; and(ii)thephenotypesand genotypesoftheirrelatives. Genotyping-by-sequencing(GBS): ahighlymultiplexgenotypingsystem thatinvolvesDNAdigestionwith enzymesfollowedbyconstructionof areducedrepresentationlibrary, whichissequencedusinga next-generationsequencingplatform. Heteroticgroup:agroupof genotypes,irrespectiveoftheir geneticrelatedness,thatdisplay similarcombiningabilityandheterotic responsewhencrossedwith genotypesfromothergenetically distinctgermplasmofthesame group.

Linkagedisequilibrium(LD):the nonrandomassociationsofallelesat twolocithatcanoccurevenifthe twogenesareunlinked.

Marker-assistedselection(MAS): aprocessforselectingindividuals basedontrait-linkedmarkers. Marker-assistedrecurrent selection(MARS):atypeofMAS thatallowsthesimultaneous identificationofassociatedallelesas wellastheaccumulationofsuperior allelesfrombothparents. Quantitativetraitlocus(QTL):a gene(DNAsectionalsocalledlocus) thatisassociatedwithvariationina phenotype. (A) Phenotype TRN populaon TST populaon Genotype with genome-wide markers Develop a predicon equaon Predict breeding values for individuals in TST Genotype with genome-wide markers

TRN and TST populaons in genomic selecon

(B)

Phenotypic selecon Phenotypic + genomic selecon Year 1 1 2 2 3 3 4 4 5 5 6 6 B B B B B B A A A A A A A x B A x B F1 - inducer F1 - inducer Haploids Haploids Haploids Haploids

DH per se – seed increase

DH per se – seed increase

DH per se – seed increase + genotyping

DH per se – seed increase + genotyping DH for TCs – stage I

DH for TCs – stage I DH for TCs – stage II GS model

GS model 50% in cold room 50% DH for TCs – stage I

First stage mullocaon evaluaon of TCs

First stage mullocaon evaluaon of TCs

First-stage mullocaon evaluaon of TCs

Second-stage mullocaon evaluaon of TCs

Line C x line D Line C x line D

F1 - inducer F1 - inducer

Season

Figure1. PopulationsUsedinGenomicSelectionandaSchemeofPhenotypicandGenomicSelectionin MaizeBreeding.(A)Genomicselection(GS) requiresa trainingpopulation (TRN)thathas beengenotypedand phenotypedandatestingpopulation(TST)thathasonlybeengenotypedbutnotphenotyped.(B)Reductionofcycle lengththroughGSinmaizeusingdoubledhaploids(DH)crossedwithatester(TC).

(4)

additiveand nonadditive(dominanceand epistasis) effects, thereby estimating the perfor-mance (commercial value) of thecultivars. Geneticvalues oflines are predictedfor some environmentsusinganincomplete(sparse)multienvironmenttestingscheme.

Several genetic and statistical factors complicate the practical application of GP. Genetic difficultiesarisefromthesizeanddiversityoftheTRNpopulationandtheheritabilityofthetraits tobepredicted.Statisticalchallengesarerelatedtothehighdimensionalityofmarkerdata, wherethenumberofmarkers(p)ismuchlargerthanthenumberofobservations(n)(p>>n)and themulticolinearityamongmarkers(adjacentmarkersarehighlycorrelated).Formoredetails, see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoaninverseproblem’ inthesupplementaryinformationonline.

Here, we reviewadvances inGS and GP theory inlight of theabove considerations and evaluaterecent examples from GP applied to cerealand legume breeding programs.We describetheevolutionandmainfeaturesofGPmodels,includingcomplexities,strengths,and weaknesses.WethenillustratetheuseofGPusingexamplesfromcropbreedingprograms withgenomicGEinteractions,aswellasresultsofgeneticgainsfromrapidcycleGSin maize.WealsospeculateontheprospectsforGSandGPinplantbreeding.Mostoftheresults presentedinthisreviewincludestudiesperformedonmaizeandwheatfromtheInternational MaizeandWheatImprovementCenter(CIMMYT),aswellasonchickpeafromtheInternational CropsResearchInstitutefortheSemi-AridTropics(ICRISAT)

Genomic-EnabledPredictionModelsandApplications:Copingwith Complexity

ThecomplexityofapplyingGPinbreedingoccursatdifferentlevelsandisinfluencedbyseveral factors.Whenatraitisaffectedbyalargenumberofloci,GPaccuracydependsonseveral geneticfactors:(i)thesizeandgeneticdiversityoftheTRNpopulationanditsrelationshipwith theTST population[4];thatis,whetherthecultivars inthe TRNarerelatives(closeand/or distant)ofcultivarsintheTSTset;(ii)theheritabilityofthetrait(s)underselection[complextraits withlowheritabilityandsmallmarkereffectsaresuitableforGSandGP,whereaslesscomplex traits(withhighheritability)canbepredictedbyafewmarkerswithrelativelylargeeffects);and (iii)forcomplextraitswithlargenumbersofmarkersthatarenotinlinkagedisequilibrium(LD) withthe QTL, GP accuracy is lower [5]and increases when the heritability and TRNsize increase.StudieshaveshowntheimportanceofselectinganappropriateTRNpopulationthat optimizestheaccuracyofthepredictionsofthenonphenotypedcultivarsintheTSTset[6]. Dependingonthetrait,theincreaseinGPaccuracyreachesaplateauasthepopulationsize increases.Asimilartrendwasfoundforthenumberofmarkers[7,8].

Oneimportantgenetic-statistical complexityofGP modelsariseswhenpredicting nonphe-notypedindividualsinspecificenvironments(site–yearcombinations)byincorporatingGE interactionsintothestatisticalmodels.Equallyimportantisthegenomiccomplexityrelatedto GEinteractionsformulti-traits;theseinteractionscreatetraitandenvironmentalstructures that shouldbe dealt with byusing statistical-genetic models that exploitmulti-trait, multi-environmentvariance-covarianceandgenetic correlationsbetweenenvironments, between traits,and between traits and environments, simultaneously. Untanglingthe complexityof multi-traitgenomicsandmultipleenvironmentsrequiresatheoreticalframeworkthataccounts forthesecomplexinteractions[9](see‘Bayesianmulti-traitmultienvironmentgenomicmodel fornormalphenotypes’inthesupplementaryinformationonline).Interestingly,theuseofGPto improvediseaseresistancehasbeenchallenginginwheatfortworeasons:(i)selectionfor majorresistancegenescanbeephemeralduetochangesinpathogenraces;and(ii)breeding forminorresistancegeneswithsmalleffectsthroughoutGS(whichprovides durable resis-tance)mayfacetheusualcomplexitiesencounteredinGS[10].

(5)

AnotherlevelofcomplexityoccursinGSstatisticalpredictionmodelsbecausethenumberof markers (p) is larger than the population size (n) and the predictors (markers) are highly correlated. Thissituation results in a matrix of predictors that is rank deficient, making it impossibletocomputeleast-squareestimatesformarkereffects.Thecomplexityarisesfrom factorssuchasthecourseofdimensionality[11];thatis,undermodelswithp>>n,whichare notlikelihoodidentifiedandarepronetooverfitting,spuriousfeaturesanddatastructuresmay becaptured(see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoan inverseproblem’inthesupplementaryinformationonline).Solutionstotheseproblemsinclude theuseof:(i)penalizedregression;(ii)variableselection;and(iii)dimensionalityreduction(e.g., principalcomponents),suchthatanewsetofpredictorsthatarenotcorrelatedisgenerated fromtheoriginalone(markers),thusallowingtheuseofunivariatedistributionsanddecreasing the computationtimeof the estimatesandthe prediction [12].A fourthsolution is to use statisticalmodelsthatassessGPcomplexitiesandhigh-densitymarkerplatformswithGE interactions,therebyaddingpowertotheGPmodels(seethenextsection).

GPmodelsbasedonbasicquantitativegeneticsdescribethephenotypicresponseasthesum ofageneticvalue(linearadditivemodels)andaresidualvalue.AlargebodyofGPresearchhas focusedondevelopingefficientparametricandnonparametricstatistical andcomputational models withincreasedaccuracy forpredicting nonphenotyped genotypes[13]. Ingeneral, thesetheoreticalstudiesshowreasonablygoodpredictionaccuraciesforcomplextraitssuch asgrainyieldandothertraitsevaluatedbymeansofindependentrandomcross-validationdata partitioning.IncontrasttothewidespreaduseofGPtopredicttheperformanceofonetraitin theTST populations usingdatafrom the sametraitobservedinthe TRNpopulations,the complexityofextendingthistomulti-traitGPindiceshasnotreceivedmuchattention,except foramethodproposedbyCerón-Rojasetal.[14]thatisbasedonthemulti-traitGenomicBest Linear Unbiased Estimator (GBLUP) selection index, which worked well when applied to simulatedandrealdatasets.

Withadvancesin GS and GP, datavolumes andcomplexity have increased dramatically, leading to novel interdisciplinary research efforts to integrate computer science, machine learning,mathematics, physics,statistics, geneticsandquantitative genetics,and bioinfor-matics.Suchworkhasemergedasanewfieldofresearch(commonlyknownas‘datascience’ ordata-drivenscience)thataimstounifystatisticswithdataanalysis,datamining,andsoon. Theinterdisciplinaryresearchersindatasciencefocusoncomputingmoreaccuratepredictive valuesbyusingstatisticalmodelsormachine-learningmodels(R.McDowell,MScthesis,Iowa State University,2016). Neural networkmethodsare commonprediction toolsinmachine learning.Neuralnetworkscompriselayersofinterconnectedneurons,wheretheoutputofeach neuronisexpressedasthesumofacertainnumberofinputstoaneuronlocatedinaspecific networklayer,withaweightplusabias;thesumofallinputsis weightedbyan activation function.

WhenneuralnetworksareappliedtoGP,theinputlayeriseachmarkerwithoneneuronper marker;eachoftheneurons(markers)intheinputlayerisconnectedtoalltheneuronsinthe firsthiddenlayer,andtheseareconnectedtoallneuronsinthesecondhiddenlayer,andsoon, uptotheoutputlayer,whichisonehiddenneuronlayerwiththepredictionofeachofthe phenotypes.Recentdevelopmentsinneuralnetworksandspeediercomputerprocessinghave allowedtheadditionofnewlayerstotheneuralnetwork(deepmachinelearning)tocapture smallcrypticcorrelationsbetweeninputs[15],whichinGPareinteractionsbetweenmarkers. Initialapplications of machine-learning and neural networks in GP were demonstrated by Gianolaetal.[16,17],González-Camachoetal.[18,19],Pérez-Rodríguezetal.[20],Ornella etal.,[21],andGonzález-Recioetal.[22].Recentresultsfordeepmachinelearningappliedto GPcanbefoundelsewhere(R.McDowell,MScthesis,IowaStateUniversity,2016).

(6)

TheAccuracyofGPModels,andGeneticGainsAchievedbyGS

TheGPmodelsassessdifferentpredictionproblemsthatattempttomimicwhathappenswhen predictionsaremadeinrealsituations.Differentrandomcross-validationschemeshavebeen designedtosimulatethepredictionproblemsthatresearchersmayfacewhenperformingGS. There are four basic scenarios arising from combinations of tested (observed) lines (LT), untested(unobserved)lines(LU)withtested(observed)environment(ET),oruntested (unob-served)environments (EU). Predicting newly developed lines (or cultivars) in environments wherethey were nottestedisa caseof LU-ET(randomcross-validation1, CV1).Another problemistopredictlinesinsomeenvironmentsbutnotinothers;thisisLT-ET(random cross-validation2,CV2),whichattemptstomimiconeoftheobjectivesofGP:sparsetesting.Another problemcomprisespredictinglinesinuntestedenvironments;thatis,LT-EU(random cross-validation0,CV0).Finally,thereistheproblemofpredictinglinesneverobservedin never-observedenvironments,LU-EU(cross-validation00,CV00)[23–27].

Simulationandempiricalresultsobtainedbyrandomcross-validationsuggestthatGS enhan-cesgeneticgainsbyshorteningthebreedingcycle(rapidselectioncycle)and/orenhancing testingefficiencyinfieldevaluations[3,28–31].Resultsofusingrandomcross-validationon maizeandwheatbreedingdataindicatethatGScansignificantlyenhancepredictionaccuracy relatedtopedigreeandMASforlow-heritabilitytraits[13,18–20,32–43].ResultsofapplyingGS inmaizeandwheatbreedingindicateitseffectivenessinselection[44–48].

BreedingprogramsworldwidehavebeenstudyingandapplyingGSandGPinseveralcrops.In parallel,extensiveresearchhasresultedinnovelstatisticalmethodsthatincorporatepedigree, genomic,andenvironmentalcovariates(e.g.,weatherdata)intostatistical-geneticprediction models. GBLUP models [49,50] are widely used in GP, and the extension of GBLUP for incorporatingGEinteractionshasimprovedtheaccuracyofpredictingunobservedcultivars in environments [23,24,51–56]. New models for assessing the GP accuracy of discrete responsevariables(e.g.,ordinaldiseasedata, suchas rates,countdata, andsoon)were proposed[57–61]togetherwithBayesian genomicmodelsfor analyzingmultipletraits and multipleenvironments.AcomputationallyefficientMarkovChainMonteCarlo(MCMC)method thatproducesfullconditionaldistributionsoftheparameters,leadingtoexactGibbssampling fortheposteriordistribution,hasalsobeendeveloped[9].Resultsfromsimulatedand(two) extensivedatasetsshowthat,whenthecorrelationbetweenthe traitsishigh,aproposed modelwithanunstructured covariancematrix ispreferredoverthe diagonalandstandard methodstohelpimprovethepredictionaccuracyforgrainyield.However,whencorrelations arelow,itisenoughtousethestandardmodel[9].

Depending onthe complexity of the traitand the prediction scenario, moresophisticated modelsresultinmoderate-to-highgainsinpredictionaccuracy.Inseveralstudies,complex models increased prediction accuracyby >10% at noadditional cost.The useof simple modelsmaymissimportantdatafeaturesandcauselossesofpredictionaccuracyforcomplex traits,wherenonlinearmodelsusuallygivesignificantlyhigherpredictionaccuracythanlinear models[20,36,55,56].OneofthefirstassessmentsofGPinwheatbreedingwasperformedon acollectionof599 wheatlinesevaluatedinfour environmentsusingpedigreeandgenomic informationwithtwodifferentmodels[34].ThemodelswerethestandardGBLUPwiththelinear genomicmatrix Gandanonparametricmodel,ReproducingKernelHilbert Spaces(RKHS) regressionwithanonlineargenomicmatrix,theGaussiankernel(GK)[62].Themostcomplex model,RKHSusingaGKnonlinearkernel,includingpedigreeandmarkerinformation,gavethe highestpredictionaccuracy,rangingfrom15%to36%,withrespecttothepedigreemodel alone.TheGP modelRKHSwithGK,pedigree,andmarkershasbeenusedforpredicting resistancetoleaf,stem,andstriperust,septoria,tanspot,andStagonosporanodorumblotch

(7)

inwheat;comparedwithstandard least-squaresmultipleregressionmethods,RKHSgave increasesinaccuracyof42%and48%inthetworeportedstudies[63,64],respectively.

Basedontheencouragingresultsobtainedinsomemajorcereals,preliminarystepshavebeen takentodeployGStodevelopsuperiorlinesmorequicklyandenhancetherateofgeneticgain inafewlegumecrops,suchaspea,soybean,chickpea,groundnut,andpigeonpea[65].The recentavailabilityofcost-effective,high-throughputsequencinghasfacilitatedthe develop-mentoflarge-scalegenomicresourcesinmostlegumes.Soybeanwasthefirstlegumecrop where GS was deployed for improving yield and agronomictraits using genotyping-by-sequencing(GBS)inabreedingprogram[25].ToassesstheutilityofGSinsoybeanbreeding programsandunderstandtheeffectofmarkerselectionandgenotypeimputation,prediction accuracieswerecalculatedfortwogenomicpredictionmodels,namely,thestandardGBLUP modelwithadditiveeffectsandanextendedGBLUPwithadditive-by-additiveepistasis.High predictionaccuracy(0.64)indicatedthepotentialofusingGStoimprovegrainyield[25].

Inthecaseofchickpea,acollectionof320elitebreedinglineswasgenotypedusingDiversity Arraytechnology(DArTseq)andphenotypedforyield-relatedtraitsintwoenvironmentswith twodifferenttreatments(i.e.,rainfedandirrigated)intwodifferentseasons[65,66].Various statistical models (RR-BLUP, Kinship GAUSS, BayesCp,Bayes B, Baysian LASSO, and randomforestregressionorRFR)resultedinhighpredictionaccuraciesforthetraitsofinterest; however,notmuchvariationinpredictionaccuracyamongthedifferentmodelswasobserved

[65,66].Whenpopulationstructurewasincludedinthemodel,predictionaccuraciesimproved slightlyfordaystomaturity(DM),daystoflowering(DF),andseeddryweight(SDW),butnotfor seedyield(SY).

Ingeneral,earlystatisticalmodelsdevelopedforGPinanimalbreedingwerebasedon single-environmentassessments.However,inplantbreeding,GEinteractionsareofparamount importance.JustasGEinteractionsareafundamentalchallengeinplantbreeding,theyare alsoincreasinglyrecognizedasamajorcomplexityinGPmodels.

GPIncorporatingGenotypeEnvironmentInteraction

Includinghigh-densitymarkerplatformswithGEinteractionsincreasestheaccuracyofGP models; this has been extensively studied in bread wheat, maize, and legumes [23– 27,55,56,67].In all GP models that incorporateGE interactions, accuracywithrespect tosingle-environmentanalysesincreased10–40%onaverageinallthreecropspecies.The main models used to assess GP accuracy by incorporating GE interactions and their applicationtorealdataaredescribedbelow.

MultienvironmenttrialsforassessingGEinteractionshaveanimportantroleinplantbreeding forselectinghigh-performingandstablelinesacrossenvironments.Burgueñoetal.[23]were thefirsttousemarker-andpedigree-basedGBLUPmodelsforassessingGEinteractions undergenomicprediction,whileHeslotetal.[52]incorporatedcrop-modelingdatatostudy genomicGEinteractions.Areactionnormmodel,wherethemainandinteractioneffectsof markersandenvironmentalcovariatesare introducedusing high-dimensionalrandom vari-ance-covariance structures of markers and environmental covariates, was developed by Jarquínetal.[24]asanextensionofthewell-knownGBLUPmodel.

Thebaselinemodelforphenotypesevaluatedindifferentenvironments(yij)canbedescribed

usingEquation1:

(8)

wheremistheoverallmean,Ei(i=1,...,I)istherandomeffectoftheithenvironment,Ljisthe

randomeffectofthejthline(j=1,...,J),EL

ijistheinteractionbetweentheithenvironmentand

thejth line,andeijistherandomerror term.Theassumptionsare asfollows:EiiidN 0;s2E

 , Lj iid N 0;s2 L  ,ELij iid N 0;s2 EL  ,andeij iid N 0;s2 e 

,withN(.,.)denotinganormaldistribution,and ‘iid’ standing for independent and identically distributed. However, when the number of environmentsissmall,it maybebettertoassumeit asafixedeffect.

MarkerscanbeintroducedinEquation1suchthattheeffectoftheline(Lj)canbereplacedbygj

expressedasalinearregressiononmarkercovariates(itapproximatesthegeneticvalueofthe jthline).Thevectorcontainingthe‘genomicvalues’isgN 0;Gs2

g

 

,wheres2

gisthegenomic

variance,andGisagenomicrelationshipmatrix.Also,theeffectoftheline(Lj)canbereplaced

byaj,withaN 0;As2a



,whereAisthenumericaladditiverelationshipmatrixderivedfrom pedigree,ands2

aistheadditivevariance.TheinteractioncovariancematrixistheHadamard

productoftwocovariancestructures,onedescribingrelationships betweenlinesbasedon geneticinformation(pedigreeorgenomic)andtheotherrelatingenvironmentsbymeansof environmentalcovariates.Whentheenvironmentaleffectisassumedasfixed,theinteraction termistheHadamardproductofthefixedeffectofenvironmentsandthecovariancematrixof linesthatwasbuiltwiththegeneticinformation.Whenenvironmentalcovariablesareused,the namedreactionnormisjustifiedbecausethegenotypiceffectisareactiontothose environ-mentalcovariables,whereas,whenenvironmentalcovariablesarenotused,thereactionnorm modelshaveunknownenvironmentaldeviations.

Thisreactionnormmodel[24]hasbeenappliedsuccessfully usingpedigreeandmolecular markersinmultienvironmentsaddingenvironmentalcovariates,forexample,incottontrials withenvironmentalcovariates[68],inGPofextensivewheatgenebankaccessions[69],inGP of Fe and Zn in wheat grain [70], in GP of bread wheat lines in sites locatedin diverse agroecologicalzones[27,25],inGPpredictionofwheatlinesevaluatedinMexicoandpredicted inlocationsinSouthAsia[71],andinGPofextensivefieldtrialsinwheatondifferentcontinents

[67].ThereactionnormmodelwasalsoextendedtoGEinteractionswithmaizeandwheat diseaseordinalandcountdatabyMontesinos-Lópezetal.[57–60](see‘Bayesian genomic-enabledpredictionmodelsforordinalandcountdataincorporatinggenotypeenvironment interaction’ inthe supplementary information online). The increase in GP accuracy of the reactionnormmodelwithGEinteractionswasonaverage7–20%relativetotheprediction accuracyoftheGBLUPwithoutincludingGEinteractions.

InareassuchasKansas,USA,wheatproductionisimpactedbyyearlyclimatefactors,suchas extremetemperaturesanderraticprecipitation.Yearlyeffectsarenotrepeatableandrepresent thedynamicpartoftheGEinteractions,whereassiteeffectsrepresentthestaticrepeatable component.The accuracyof predicting unobserved historical sites in the wheat breeding programofKansasStateUniversityreaches0.54,butunobservedyearscanbepredictedwith only0.17accuracy[26].Resultsofusingthereactionnormmodelwithpedigreeandgenomic informationforpredicting400–500breadwheatlinesinSouthAsianlocationsusing approxi-mately 60000 lines trained in different Mexican environments indicated that GP is more accurate (0.35) than PS for predicting unobserved wheat lines in different South Asian locations(0.20)[71];perhaps thislastresultisthesimplestproofofthe conceptthatGP worksbetterthanPS.

GPIncorporatingMarkerEnvironmentInteractionModels

TheGEinteraction modeldescribedby López-Cruzetal. [53] decomposes themarker effectsintocomponentsthatarecommonacrossenvironments(stability)and environment-specific deviations(interaction) [see‘The GE (orME) model withlinear kernel’inthe supplementaryinformationonline].Thismodelborrowsinformationfromacrossenvironments

(9)

whileallowingmarker effectsto changeineach environment.Itcanbeimplemented using shrinkagemethodsaswellasvariableselectionmethodsand,thus,canbeusedtoidentify genomicregionswhose effectsare stableacrossenvironmentsandotherregions thatare responsibleforGEinteractions[54].TheGEinteractionmodelofLópez-Cruzetal.[53]is bestsuitedforthejointanalysisofpositivelycorrelatedenvironmentsandwasusedtoanalyze threeCIMMYTwheatdatasets.ThepredictionaccuracyoftheGEinteractionmodelwas greaterthanacross-environmentorsingle-environmentanalyses(5–29%whenpredictingeach ofthe environments).Recently,this genomicGEinteraction modelwas usedto predict untesteddurumwheatlinesinenvironments,aswellasavariableselectionmodeltoidentify genomicregionswhoseeffectsarestableacrossenvironmentsandothersthatare environ-mentspecific[54].

InthemodelsofJarquínetal.[24]andLópez-Cruzetal.[53],thekernelusedisthelinearkernel GBLUP.Inanewstudy,Cuevasetal.[55]proposedaGEinteractionmodelsimilartothatof López-Cruzetal.[53]butwithanonlinearkernel,theGaussiankernel(GK),whichissimilartothat usedintheRKHS [62][see‘TheGEmodelofLópez-Cruzetal.(2015)withanon-linear Gaussiankernelmethod’inthesupplementaryinformationonline].Usingtwoextensivedatasets, theauthorsfoundthat,forthewheatdatasets,theGKgavepredictionaccuraciesupto17% higherthantheGBLUPlinearkernel.Forthemaizedataset,theGKwasonaverage5–6%higher thantheGBLUPlinearkernel.TheadvantageoftheGKovertheGBLUPisthatitisamoreflexible kernelthataccountsforsmallandcomplexmarkermaineffectsandspecificinteractions. OneweaknessofbothGEinteractionmodelswithalinear[53]andnonlinearkernel[55]is thatpositivecorrelationsbetweenenvironmentsareassumed.However,whenthecorrelation between environments is low or negative, these models do not increase the prediction accuracyofenvironmentswithnegligibleornegativecorrelations.ThestrengthoftheBayesian modelswithGEinteractionsusedunderthe linearkernelGBLUPorunderthenonlinear Gaussian kernel (GK) is that they overcome the limitation of the previous models when associations betweenenvironments arenegligible ornegative, as shown by Cuevas etal.

[56].TheseauthorsproposedconsideringthegeneticeffectsðuÞdescribedbytheKronecker productofvariance-covariancematricesofgeneticcorrelationsbetweenenvironmentsand genomic kernels through markers under two linear kernel methods: linear (GBLUP) and Gaussian(GK).Anextension includesthesamegeneticcomponent asthefirstmodelðuÞ, plusanextraresidualgeneticcomponent,f,whichcapturesrandomeffectsbetween environ-mentsthatwerenotaccountedforbytherandomeffectsu(see‘Multienvironment genoty-peenvironment interactionmodelwithlinear andnonlinearkernels’inthesupplementary informationonline).Resultsoftheanalysesoffivedatasetsshowedthat:(i)GEinteraction modelsalwayshadsignificantlyhigherpredictionaccuracythansingle-environmentmodels; and(ii)thepredictionaccuracyofGEmodelswithuandfoverthemultienvironmentmodel withonlyuwashigher85%ofthetimewithGBLUPand45%ofthetimewithGKacrossthe five data sets.Results indicated that including the random effect f was still beneficial for increasingGP accuracyafteradjusting fortherandomeffectu.Predictionaccuracyofthe GEinteractionmodelmethods(GBLUPorGK)increasedupto85%overtheaccuracyof thesingle-environmentmodel.

MachineLearningforGenomicPrediction

InanappliedGScontext,thefocusshouldnotbeonpredictingallindividuals,butratheron classifyingindividualsintoupper,middle,orlowerclasses,dependingonthetraitunderselection. UsingclassifiersinGSisattractivebecausetheyaretrainedtomaximizetheprobabilityofan individualbeingamember ofthetargetclass,ratherthansearchingforitsoverall performance[21]. González-Camachoetal.[18]usedaradialbasisneuralnetworkonanextensivegenomicmaize datasetcomprisingseveraltraitsindifferentenvironmentsandcomparedtheresultswithRKHS

(10)

andwithalinearregressionmodel.Theaccuracyofneuralnetworkmethodswassimilartothatof RKHSandslightlyhigherthanthatofthelinearregression.

Inarecentstudy,twoneuralnetworkclassifiers(amultilayerperceptron,MLP,anda probabi-listic neural network, PNN) were compared for predicting the probability of an individual memberofatargetphenotypicclass,using33maizeandwheatgenomicandphenotypic datasets[19].Theauthorsfocusedonthe15thand30thpercentilesoftheupperandlower classestoselectthebestindividuals,ascommonlydoneinGS(fortraitssuchasgrainyield,the upperclassesarethetarget;fordiseases,thefocusisonthelowerclasses).Thecriterionfor assessing the prediction accuracy of MLP and PNN was the area under the receiver operatingcharacteristiccurve(AUC). Theparametersofbothclassifierswereestimated byoptimizingtheAUCforaspecifictargetclass.ThePNNwasfoundtobemoreaccuratethan theclassifierMLPforassigningmaizeandwheatlinestothecorrectupper,middle,orlower class.Resultsforthewheatdatasetwithcontinuoustraitssplitintotwo andthree classes showedthattheperformanceofPNNwiththreeclasseswasbetterthanPNNwithtwoclasses whenclassifyingindividualsintotheupperandlower(15%or30%)categories.Dependingon themaizetrait–environmentcombination,AUCforPNN30%orforPNN15%uppertrait(grain yield)washigherthantheAUCofMLP.Forthelowerclass,flowering(maleandfemale)traits, PNN15%andPNN30%alwayshadbetterAUCthanMLP15%andMLP30%[19].

GeneticGainsfromRapidSelectionCycleGS:CIMMYTMaizeBiparentalPopulations

ThefundamentalgoalofusingGSinbreedingistoachievegreatergeneticgainsatalowercost andinlesstimethanwithconventionalpedigreebreeding.Toachieveashorterintervalcycle,a favorableuseofGSispredictionwithinfull-sibfamilies,becausebiparentalpopulationshave highLDbetweenmarkerallelesandtraitalleleswithnogroupstructure.

TherearefewstudiesthatmeasurethegeneticgainsachievedthroughuseofaGS-based rapidselectioncycle.ThefirststudyconfirmingthepromiseofarapidselectioncycleinGPof biparentalpopulations,aswellaspreviousfindingsfromrandomcross-validationstudies,was conductedbyMassmanetal.[44]andshowedthatGSimprovedmaizegeneticgainsperunit oftime.GeneticgainswerealsoreportedbyAsoroetal.[45]inoatandbyRutkoskietal.[48]in wheat,whichshowedthatGSandPSprovidedsimilarrealizedgeneticgainsperunitoftime.

GeneticgainstudiescomparingGScyclesC0,C1,C2,andC3withpedigreeselectiononeight

CIMMYTtropical biparental maize populations in Sub-Saharan Africa were conducted by Beyeneetal.[47] underdrought conditions.Theauthorsshowedthat:(i)theaveragegain percycle(acrossalleightbiparentalpopulations) fromGS was0.086t/haunder managed droughtconditions;(ii)theaveragegrain yieldofC3-derivedhybridswassignificantlyhigher

thanthatofhybridsderivedfromC0;and(iii)threeGScyclescanbeachievedin1year.The

authorsconcludedthathybridsderivedfromC3produced7.3%highergrainyieldthanthose

developedthroughconventionalpedigreebreeding.Bycontrast,theaveragegainpercycle usingmarker-assistedrecurrentselection(MARS)acrosstenpopulationswas0.051t/ha percycle under managed drought stress. Extensive field trialswere conducted inseveral manageddroughtenvironmentsinSub-SaharanAfricatoevaluatethegrainyieldperformance ofmaizehybridsderivedfromGS-basedlinesselectedfromdifferentpopulations.Linesderived fromcycleC3GSinhybridcombinationproducedsignificantlyhigheraveragegrainyieldthan

linesfromC0GSinhybridcombination(Y.Beyene,2017).

Anotherexampleofgeneticgainsfromrapid-cycleGSonCIMMYTmaizeistwo biparental maizepopulations(F2:3)fromAsia(CAP1andCAP2)thatweredevelopedandevaluatedfor

testcrossperformanceunderdroughtandoptimalconditions[72].Thegeneticgainsperyear forPSversusGSindroughtenvironmentswere0.067t/haversus0.124t/ha,respectively,for

(11)

CAP1,and0.076t/haversus0.104t/ha,respectively,forCAP2.Thecorrespondinggenetic gainsperyearforPSversusGSinoptimalenvironmentswere0.084t/haversus0.140t/ha, respectively,forCAP1,and0.123t/haversus0.13t/ha,respectively,forCAP2.Resultsofthis studyconfirmedthatGSofsuperiorplantphenotypesproducedrapidgeneticgainsindrought toleranceinmaize.

GeneticGainsfromRapidSelectionCycleGS:CIMMYTMultiparentalPopulations

Most GS results in maize have been achievedby rapid cycling of biparental populations (e.g.,F2:3segregatingpopulationscrossedwithatesterfromtheoppositeheteroticgroup).

Fiveyearsago,theGlobalMaizeProgramofCIMMYTdesignedaGSrapidcycleof multi-parentalcrosses.Fifteenelitetropicalmaizelineswerecrossedindiallelicfashiontoformcycle 0(C0),whichwasgenotypedusingGBSmarkersandphenotypedattwolocationsinMexico;

plantswiththebestphenotypewereselectedtoformtheparentsforGScycle1(C1).TheC1

parentswereintercrossedandtheprogenygenotypedwiththesameGBSmarkersusedfor theC0population[73].

Twocyclesperyearwerecompletedand,attheendofthesecondyear,seedsfromcyclesC0,

C1,C2,C3,andC4werecollected,assembled,andsown attwolocationsinMexico.Fifty

entries per genomic cycle were sown at each location, together with two widely used commercialtropical maizehybrids.Average genomicgrain yieldgainsreached 0.134t/ha, withC0producing6.653t/ha.GrainyieldofC1wasslightlylower(6.488),andcyclesC2,C3,

andC4producedmeanyieldsof7.022,6.879,and7.126t/ha,respectively.Therealizedgrain

yieldfromC1toC4reached0.225t/hapercycle,whichtheauthorsconsideredequivalentto

0.1t/haperyearoverabreedingperiodof4.5years.Aslightdeclineingeneticdiversitywas detectedatC4comparedwithC0.

ProspectsforEnhancedUseofGSinPlant Breeding

ToacceleratethedeploymentofGSincropbreedingwhilereducingthecostoflineandhybrid development,hereweexaminethecombineduseofGSwithhighthroughputphenotype(HTP) forearly-generationtestinginplantbreeding.Inaddition,weexaminetheapplicationofGSin germplasmenhancementandprebreedingusinggenebankaccessions.

CombiningMulti-TraitMultienvironmentGSwithHigh-ThroughputPhenotyping:The

CIMMYTCase

Inmodernagriculture,high-resolutioncamerasareusedtoobtainhundredsofreflectancedata measuredatdiscretenarrowbands(wavelengths)tocoverthewholespectrum.This informa-tionisusedtoconstructvegetationindices(e.g.,GreenNormalizedDifferenceVegetationIndex orNDVI)topredictprimarytraits(e.g.,grainyield).However,theseindicesuseonly certain bandsandarecultivarspecific;thus,theyfailtocaptureconsiderableinformationorperform robustlyforallcultivars.Theadvantageofthisimagingtechnologyisthatmassivenumbersof phenotypescanbescreenedinexpensivelyduringearly-generationtesting.

ThemainobjectiveofGSisto reducephenotypingcosts byusingmarkersandaccelerate geneticgains,whereastheaimofHTPistomeasurehigh-densityphenotypesinverylarge numbersof individualsorbreeding linesacrosstime andspace using remoteorproximal sensing.Thiscanincreaseboththeaccuracyandintensityofselectionand,subsequently,the selection response, while decreasingphenotyping costs. Themain idea of HTP is to use secondarytraitsrelatedtograinyield,diseaseresistance,orend-usequalitythatmaybeuseful inearly-generationtestingoflines.Arecentstudy[74]foundthatthehighestaccuracywhen predictinggrainyieldisachievedbytheuseofbroaderselectionofwavelengths(see‘Predicting grainyieldusingcanopyhyperspectralreflectanceinwheatbreedingdata’inthe supplemen-taryinformationonline).

(12)

UsingHTPplatformswithvegetationindicesaspredictortraitsinpedigreeandGSmodelscan increasethepredictionaccuracyforgrainyield[75].Predictionduringearly-generationtestingis important to enhancegenetic gains during early stages of the breedingcycle, but GS is economicallyunfeasibleatthosestagesduetothelargenumberofplantsinthefield;thus, whileassessingpedigreerelationshipshastheadvantageofnotcostinganything,theirusefails toexploitMendeliansampling,incontrasttoGS.Therefore,usingmultivariatepedigree-based predictionmodelsthatincorporatesuchpredictortraitswhileusinganHTPplatformoffersa low-costsolutionforpredictinggrainyieldamongandwithinfamiliesinearly-generationtesting, asdemonstratedbyRutkoskietal.[75].Theseauthorsfoundthatwithin-environment sec-ondaryvegetativeindicesincreasedpredictionaccuraciesforgrainyieldby59%usingpedigree relationshipsandby70%usinggenomicrelationships.Theseresultsindicatethatsecondary traitsmeasuredbyHTPandusedwithpedigreecanimprovethepredictionaccuracyofprimary traits during the early stages of breeding. Reynolds and Langridge [76] found that HTP techniquesareeffectiveforevaluatinggeneticresourcesforcomplextraitexpression.

HTPplatformsareusefultomeasuresecondarytraitsthataregeneticallycorrelatedwithgrain yieldandthatcanbeincorporatedinmultivariatepedigreeandgenomicpredictionmodels, improvingindirectselectionforgrainyield.Sunetal.[77]usedastatisticalmodelthatestimated BVsofsecondarytraitstogetherwithmultivariatepedigreeandgenomicmodels;theauthors founda70%(onaverage)improvementintheaccuracyofselectionforgrainyield,including secondarytraitsinbothTRNandTSTpopulations.

Arecentstudydevelopedstatisticalmodels toassesshyperspectralwavelength environ-ment interactions in HTP, incorporating genomic and pedigree GE interactions [78]. AlthoughlittleGPaccuracywasachieved,importanthyperspectralwavelengthenvironment interactionswereobserved,demonstratingthatGScoupledwithHTPcanbeapowerfultool appliedtoearly-generationtestingofalargenumberofselectioncandidates.Thefull condi-tionaldistributionsformodelingthethree-waytraitGEinteractionmodelof Montesinos-Lópezetal.[9]canbeadaptedtoincludethehyperspectralbandsinthefunctionalregression approachrecentlydescribedbyMontesinos-Lópezetal.[74,78].

ExploringtheApplicationofGStoGeneBankAccessionsforGermplasmEnhancement

Althoughtheaccessionsstoredingenebanksrepresentarichresourceforbreeders,alleles needtobeextractedfromtheaccessionsforcultivardevelopment,whichistime-consuming and expensive [79,80]. Lengthy prebreeding programs are requiredto develop lines that combinefavorableallelesfromthegermplasmbankwithgoodagronomicperformanceand that can be used as parents in a breeding program. Based on the simulation of various prebreedingoptions, germplasm enhancement breeding programs can start directly from landracesorfromlandracescrossedwithelitetesters[81].

The GP accuracy of 8416 Mexican wheat landrace accessions and 2403 Iranian wheat landraceaccessions from the CIMMYT gene bank were examined by Crossa et al. [69]. Theauthorsmeasuredtwotraitsintwoenvironmentsandseveralhighlyheritabletraitsina singleoptimumenvironment.TheGPaccuracyforseveraltraits(maturity,qualitytraits,and grainyieldandyieldcomponents)underthedifferentpredictionscenarioswashigh,ranging from0.5to0.7.Insoybean,Jarquínetal.[25]analyzedtheUSDAsoybeancollectionusing differentcross-validationschemesandgroupingfactors(trials,states,andgenetic subpopu-lations); results showed relatively high prediction accuracies that should help breeders to introgressusefulgeneticvariation(Box1).

ThesepreliminaryresultsoftheaccuracyofGPofgenebankaccessionsfavortheideaof applyingGStointrogresslandraceaccessionsinelitegermplasmandformgenepoolsand

(13)

populations suitable for prebreeding and germplasm enhancement programs. However, furtherresearchisrequiredinthisarea.

ConcludingRemarks

Manystatistical methodshave beendevelopedto predictunobservedindividualsinGS.In general,linearmodels(e.g.,GBLUP)andmachine-learningalgorithmshavebeensuccessfulin recognizing complexpatterns andmaking correctdecisions based ondata. Kernel-based methods,suchastheRKHS,haveextensivelydeliveredgoodgenomicpredictionsinplants. SeveralstatisticalmodelsbasedonthestandardGBLUPthatincorporateGEinteractionsin genomicand pedigreepredictions have providedsubstantial increases inthe accuracyof predicting unobserved individuals in environments. These GS prediction models can help scientistsindifferentdisciplinesto developdrought- andheat-tolerant plantsby exploiting positiveGEinteractions.Modelingmulti-traitmulti-environmentisessentialforimprovingthe predictionaccuracyoftheperformanceofnewlydevelopedlinesinfutureyears.

TheuseofstatisticalmodelsinextensivehyperspectralimagetechnologyforHTP,together withgenomicand pedigreeinformation inearly-generationtesting,offers anopportunityto accelerate genetic gains by increasing the intensity of selection. Deep machine-learning methodsusing neuralnetworking appearpromisingto increase the accuracyof genomic-enabledprediction.Genomicselectionhasaclear-cutadvantageoverpedigreebreedingand MAStoenhancegeneticgainsforcomplextraits.Theappropriateuseofgenotypingplatforms combinedwithprecisephenotypingplatformswillalsohelpenhancepredictionaccuracyand accelerategenetic gainsbyshortening thebreeding cycle. Further researchis requiredto incorporateGSwithHTPasaroutinecomponentinplantbreedingprograms.

Developing GP models for gene bankaccessionswill be important to access unexplored diversityandfast-trackusefulportionsintobreedingprograms(alsoseeOutstanding Ques-tions).Currently,GSisthemost-promisingbreedingmethodtospeedthedevelopmentand releaseofnewgenotypes;therefore,theuseofGStoformgenepoolsandpopulationsfrom richgenebankaccessionsmeritsextensiveandintensivestudy,especiallygiventhe vulnera-bilityofelitelinesandhybridstosevereclimatechangeeffects.

Box1.AnExampleofGSDeploymentinPrebreeding

ResultsfromusingalargenumberofMexicanandIranianlandracesstoredinthewheatgenebankofCIMMYTindicated thatpredictionaccuraciesfortraitsevaluatedinoneenvironmentforTRN20-TST80designrangedfrom0.407to0.677 forMexicanlandracesandfrom0.166to0.662forIranianlandraces.Also,predictionaccuraciesofthe20%coreset weresimilartothoseobtainedforTRN20-TST80,rangingfrom0.412to0.654forMexicanlandracesandfrom0.182to 0.647forIranianlandraces.Interestingly,thecorrelationsforcomplexGYSMtraitswereapproximately0.4forboth MexicanandIranianlandraces.ResultsofcorrelationswhenincorporatingGEinteractionsfordaystoheadingand daystomaturityforTRN20-TST80wereapproximately0.61forMexicanlandracesand0.60forIranianlandraces.The 20%coresethadcorrelationsofapproximately0.58forMexicanlandracesand0.50–0.59forIranianlandraces.

Predictionaccuraciesofthe10%and20%diversityandpredictivecoresubsetsweregenerallyofamagnitudethat appearedusefulforpredictingthevalueofgenebankaccessionsandforbreeding.Thefirstapplicationwouldbeto predictthegeneticvalueofallgenotypedaccessionsinagenebankandthenphenotypeonlythosethathavethe highestpredictedgeneticvalues.Then,abreedercouldbeginprebreedingfollowingseveralstrategies.Forexample, onedecisionwouldbe toinitiate a prebreedingconversionapproachbycrossingthe bestaccessionsamong themselvestoimprovetheaccessionsperseuntiltheybecomeelite.Anotherstrategycouldbetostartanintrogression approachbycrossingselectedaccessionstoelitematerials.

Nevertheless,applicationofGSforgermplasmenhancementwillhavetobeperformedincombinationwithstandard introgressionofexotictoelitegermplasmand,possibly,aseriesofbackcrossestotheelitematerial.Extractingthebest accessionsdirectlyfromthegenebanksandformingproductivegenepoolsmaybethefirststagebeforerefiningthe genepoolsandevaluatingthemunderdifferentenvironmentalconditions.

OutstandingQuestions

HowcanGShaveanimportantrolein enhancingtherateofgeneticgain? Are currently available GS models capableofprovidingthemost accu-ratepredictions?

Howcan GS acceleratethe flowof favorableallelesdirectlyfromthegene banktobreedingprograms? Are weready todeployGS in pre-breeding and germplasm-enhance-mentprograms?

IsitpossibletocombineGSand pedi-greeselectionwithHTPtoaccelerate genetic gains and save resources when developing lines during early-generationtesting?

(14)

Acknowledgments

Theauthorsaregratefultotheircolleagues,collaborators,andbyusingfieldandlabtechnicianfromCIMMYTand ICRISAT,aswellasscientistsinNationalProgramswhocollectedthevaluabledatausedinthevariousstudies.The authorswouldliketothankKevinPixley,DanielGianola,andfouranonymousreviewersfortheirpositive,careful,and detailedreviewsofthemanuscript;theircomments,editing,andsuggestionssignificantlyimprovedthequalityand readabilityofthemanuscript.TheauthorsalsothankMichaelListmanforeditingtherevisedversionofthemanuscript.

SupplementalInformation

Supplementalinformationassociatedwiththisarticlecanbefound,intheonlineversion,athttp://dx.doi.org/10.1016/j. tplants.2017.08.011.

References

1. Bernardo,R.(2008)Molecularmarkersandselectionforcomplex traitsinplants:learningfromthelast20years.CropSci.48, 1649–1664

2. Bernardo,R.(2016)BandwagonsI,too,haveknown.Theor. Appl.Genet.129,2323–2332

3. Meuwissen,T.H.E.etal.(2001)Predictionoftotalgeneticvalue usinggenome-widedensemarkermaps.Genetics157, 1819–1829

4. Pszczola,M.etal.(2012)Reliabilityofdirectgenomicvaluesfor animalswithdifferentrelationshipswithinandtothereference population.J.DairySci.95,389–400

5. Daetwyler,H.D.etal.(2010)Theimpactofgeneticarchitectureon genome-wideevaluationmethods.Genetics185,1021–1031

6. Isidro,J.etal.(2015)Trainingsetoptimizationunderpopulation structureingenomicselection.Theor.Appl.Genet.2015,145–158

7. Lorenz,A.J.etal.(2012)Potentialandoptimizationofgenomic selectionforFusariumheadblightresistanceinsix-rowbarley. CropSci.52,1609–1621

8. Arruda,M.P.etal.(2015)Genomicselectionforpredicting fusar-iumheadblightresistanceinawheatbreedingprogram.Plant Genome8,1–12

9. Montesinos-López,O.A.etal.(2016)AgenomicBayesian multi-traitandmulti-environmentmodel.G36,2725–2744

10.Poland,J.andRutkoski,J.(2016)Advancesandchallengesin genomicselectionfordiseaseresistance.Annu.Rev. Phytopa-thol.54,79–98

11.Bellman,R.E.(1961)AdaptiveControlProcesses:AGuidedTour, PrincetonUniversityPress

12.Cuevas,J.etal.(2014)Bayesiangenomic-enabledpredictionas aninverseproblem.G34,1191–2001

13.delosCampos,G.etal.(2012)Wholegenomeregressionand predictionmethodsappliedtoplantandanimalbreeding. Genet-ics193,327–345

14.Cerón-Rojas,J.J.etal.(2015)Agenomicselectionindexapplied tosimulatedandrealdata.G35,2155–2164

15.LeCun,Y.etal.(2015)Deeplearning.Nature521,436–444

16.Gianola,D.etal.(2006)Genomic-assistedpredictionofgenetic valueswithasemi-parametricprocedure.Genetics173, 1761–1776

17.Gianola,D.etal.(2011)Predictingcomplexquantitativetraitswith Bayesianneuralnetworks:acasestudywithJerseycowsand wheat.BMCGenet.12,87

18.González-Camacho,J.M.etal.(2012)Genome-enabled predic-tionofgeneticvaluesusingRadialBasisFunctionNeural Net-works.Theor.Appl.Genet.125,759–771

19.González-Camacho,J.M.etal.(2016)Genome-enabled predic-tionusingprobabilisticneuralnetworkclassifiers.BMCGenomics 17,208

20.Pérez-Rodríguez,P.etal.(2012)Acomparisonbetweenlinear andnon-parametricregressionmodelsforgenome-enabled pre-dictioninwheat.G3:Genes|Genomes|Genetics2,1595–1605

21.Ornella,L.etal.(2014)Genomic-enabledpredictionwithclassi fi-cationalgorithms.Heredity112,616–626

22.Gonzalez-Recio,O.etal.(2014)Machinelearningmethodsand predictiveabilitymetricsforgenome-widepredictionofcomplex traits.Livest.Sci.166,217–231

23.Burgueño,J.etal.(2012)Genomicpredictionofbreedingvalues whenmodelinggenotypeenvironmentinteractionusing pedi-greeanddensemolecularmarkers.CropSci.52,707–719

24.Jarquín,D.etal.(2014)Areactionnormmodelforgenomic selectionusinghigh-dimensional genomicandenvironmental data.Theor.Appl.Genet.127,595–607

25.Jarquín,D.etal.(2016)Genotypingbysequencingforgenomic predictioninasoybeanbreedingpopulation.BMCGenomics15, 740

26.Jarquín,D.etal.(2017)Increasinggenomic-enabledprediction

accuracybymodeling genotypeenvironment interactionin

Kansaswheat.PlantGenomePublishedonlineJune8,2017.

http://dx.doi.org/10.3835/plantgenome2016.12.0130

27.Saint-Pierre,C.etal.(2016)Genomicpredictionmodelsforgrain yieldofspringbreadwheatindiverseagro-ecologicalzones.Sci. Rep.6,27312

28.Bernardo,R.andYu,J.M.(2007)Prospectsforgenome-wide selectionforquantitativetraitsinmaize.CropSci.47,1082–1090

29.Lorenzana,R.E.andBernardo,R.(2009)Accuracyofgenotypic valuepredictionsformarker-basedselectioninbiparentalplant populations.Theor.Appl.Genet.120,151–161

30.Heffner,E.L.etal.(2009)Genomicselectionforcrop improve-ment.CropSci.49,1–12

31.Heffner,E.L.etal.(2010)Plantbreedingwithgenomicselection: gainperunittimeandcost.CropSci.50,1681–1690

32.delosCampos,G.etal.(2009)Predictingquantitativetraitswith regressionmodelsfordensemolecularmarkersandpedigree. Genetics182,375–385

33.delosCampos,G.et al.(2010)Semi-parametric genomic-enabledpredictionofgeneticvaluesusingreproducingkernel Hilbertspacesmethods.Genet.Res.92,295–308

34.Crossa,J.etal.(2010)Predictionofgeneticvaluesofquantitative traitsinplantbreedingusingpedigreeandmolecularmarkers. Genetics186,713–724

35.Crossa,J.etal.(2011)Genomicselectionandpredictioninplant breeding.J.CropImprov.25, 239–226

36.Crossa,J.etal.(2013)Genomicpredictioninmaizebreeding populations with genotyping-by-sequencing. G3: Genes| Genomes|Genetics3,1903–1926

37.Crossa,J.etal.(2014)GenomicpredictioninCIMMYTmaizeand wheatbreedingprograms.Heredity112,48–60

38.Pérez-Rodríguez,P.anddelosCampos,G.(2014) Genome-wideregressionandpredictionwiththeBGLRstatistical pack-age.Genetics198,483–495

39.Hickey,J.M.etal.(2012)Factorsaffectingtheaccuracyof geno-typeimputationinpopulationsfromseveralmaizebreeding pro-grams.CropSci.52,654–663

40.Riedelsheimer,C.etal.(2012)Genomicandmetabolicprediction ofcomplexheterotictraitsinhybridmaize.Nat.Genet.44, 217–220

41.Zhao,Y.etal.(2012)AccuracyofgenomicselectioninEuropean maizeelitebreedingpopulations.Theor.Appl.Genet.124, 769–776

42.Windhausen,V.S.etal.(2012)Effectivenessofgenomic predic-tionofmaizehybridperformanceindifferentbreedingpopulations andenvironments.G3:Genes|Genomes|Genetics2,1427–1436

(15)

43.Technow,F.etal.(2013)Genomicpredictionofnortherncornleaf blightresistanceinmaizewithcombinedorseparatedtraining setsforheteroticgroups.G3:Genes|Genomes|Genetics3, 197–203

44.Massman,J.M.et al.(2013) Genome-wideselectionversus marker-assistedrecurrentselectiontoimprovegrainyieldand stover-qualitytraitsforcellulosicethanolinmaize.CropSci.53, 58–66

45.Asoro,F.G.etal.(2013)Genomic,marker-assisted,and pedi-gree-BLUPselectionmethodsforb-glucanconcentrationinelite oat.CropSci.53,1894–1906

46.Combs,E.andBernardo,R.(2013)Genome-wideselectionto introgress semidwarf corn germplasm into U.S. Corn Belt inbreds.CropSci.53,1427–1436

47.Beyene,Y.etal.(2015)Geneticgainsingrainyieldthrough genomicselectionineightbi-parentalmaizepopulationsunder droughtstress.CropSci.55,154c163

48.Rutkoski,J.etal.(2015)Geneticgainfromphenotypicand genomicselectionforquantitativeresistancetostemrustof wheat.PlantGenome8,1–10

49.VanRaden,P.M.(2007)Genomicmeasuresofrelationshipand inbreeding.InterbullAnnu.Meet.Proc.37,33–36

50.VanRaden,P.M.(2008)Efficientmethodstocomputegenomic predictions.J.DairySci.91,4414–4423

51.Heslot,N.etal.(2012)Genomicselectioninplantbreeding:a comparisonofmodels.CropSci.52,146–160

52.Heslot,N.etal.(2014)Integratingenvironmentalcovariatesand cropmodelingintothegenomicselectionframeworktopredict genotypebyenvironmentinteractions.Theor.Appl.Genet.127, 463–480

53.López-Cruz,M.etal.(2015)Increasedpredictionaccuracyin wheatbreedingtrialsusingamarkerenvironmentinteraction genomicselectionmodel.G3:Genes|Genomes|Genetics5, 569–582

54.Crossa,J.etal.(2016)ExtendingtheMarkerEnvironment interaction model for genomic-enabled prediction and genome-wideassociationanalysesindurumwheat.CropSci. 56,1–17

55.Cuevas, J. et al.(2016) Genomic prediction of genotype

-environment interaction kernel regression models. Plant

GenomePublishedonlineSeptember22,2016.http://dx.doi.

org/10.3835/plantgenome2016.03.0024

56.Cuevas,J.etal.(2017)Bayesiangenomicpredictionwith gen-otypeenvironment interaction kernel models. G3: Genes| Genomes|Genetics7,41–53

57.Montesinos-López, O.A.et al.(2015) Threshold modelsfor genome-enabledpredictionofordinalcategoricaltraitsinplant breeding.G3:Genes|Genomes|Genetics5,291–300

58.Montesinos-López,O.A.etal.(2015)Genomic-enabled predic-tionofordinaldatawithBayesianlogisticordinalregression.G3: Genes|Genomes|Genetics5,2113–2126

59.Montesinos-López,O.A.etal.(2015)Genomicpredictionmodels forcountdata.J.Agric.Biol.Environ.Stat.20,533–554

60.Montesinos-López,O.A.etal.(2016)GenomicBayesian predic-tionmodelforcountdatawithgenotypeenvironment interac-tion.G3:Genes|Genomes|Genetics6,1165–1177

61.Montesinos-López,O.A.etal.(2017)ABayesian poisson-log-normalmodelforcountdataformultiple-trait multiple-environ-ment genomic-enabled prediction. G3: Genes|Genomes| Genetics7,1595–1606

62.Gianola,D.andvanKaam,J.B.C.H.M.(2008)Reproducing ker-nelhilbertspaceregressionmethodsforgenomic-assisted pre-dictionofquantitativetraits.Genetics178,2289–2303

63.Philomin,J.etal.(2017)Comparisonofmodelsand whole-genomeprofilingapproachesforgenomic-enabledprediction ofSeptoriatriticiblotch,Stagonosporanodorumblotch,and tanspotresistanceinwheat.PlantGenome10,1–16

64.Philomin,J.etal.(2017)Genomicandpedigree-basedprediction forleaf,stem,andstriperustresistanceinwheat.Theor.Appl. Genet.130,1415–1430

65.Varshney,R.K.(2016)Excitingjourneyof10yearsfromgenomes tofieldsandmarkets: some successstories of genomics-assistedbreedinginchickpea,pigeonpeaandgroundnut.Plant Sci.242,98–107

66.Roorkiwal,M.etal.(2016)Genome-enabledpredictionmodels foryieldrelatedtraitsinchickpea.Front.PlantSci.7,1666

67.Sukumaran,S.etal.(2017)Genomicpredictionwithpedigree andgenotypeenvironmentinteractioninspringwheatgrownin SouthandWestAsia,NorthAfrica,andMexico.G3:Genes| Genomes|Genetics7, 481–197

68.PérezRodríguez,P.etal.(2015)Apedigreereactionnormmodel forpredictionofcotton(Gossypiumsp.)yieldinmulti-environment trials.CropSci.55,1143–1151

69.Crossa,J.etal.(2016)Genomicpredictionofgenebankwheat landraces.G3:Genes|Genomes|Genetics6,1819–1834

70.Govidan,V.etal.(2016)Genomicpredictionforgrainzincand ironconcentrationsinspringwheat.Theor.Appl.Genet.129, 1595–1605

71.Pérez-Rodríguez,P.etal.(2017)Single-stepgenomicand

pedi-greegenotypeenvironmentinteractionmodelsforpredicting

wheatlinesininternationalenvironments.PlantGenome

Pub-lished online May 5, 2017. http://dx.doi.org/10.3835/

plantgenome2016.09.0089

72.Vivek,B.S.etal.(2017)Useofgenomicestimatedbreeding valuesresultsinrapidgeneticgainsfordroughttolerancein maize.PlantGenome10,1–8

73.Zhang,X.etal.(2017)Rapidcyclinggenomicselectionina multiparentaltropicalmaizepopulation.G3:Genes|Genomes| Genetics7,1–12

74.Montesinos-López,O.A.etal.(2017)Predictinggrainyieldusing canopyhyperspectralreflectanceinwheatbreedingdata.Plant Methods13,4

75.Rutkoski,J.etal.(2016)Predictortraitsfromhigh-throughput phenotypingimproveaccuracyofpedigreeandgenomic selec-tionforgrainyieldinwheat.G36,2799–2808

76.Reynolds,M.andLangridge,P.(2016)Physiologicalbreeding. Curr.Opin.PlantBiol.31,162–171

77.Sun,J.etal.(2017)Multi-trait,randomregression,orsimple

repeatabilitymodelinhigh-throughputphenotypingdataimprove

genomicpredictionforgrainyieldinwheat.PlantGenome

Pub-lished online Mary 18, 2017. http://dx.doi.org/10.3835/

plantgenome2016.11.011

78.Montesinos-López,A.etal.(2017)GenomicBayesianfunctional regressionmodelswithinteractionsforpredictingwheatgrain yieldusinghyper-spectralimagedata.PlantMethods62,13

79.McCouch,S.etal.(2012)Genomicsofgenebank:acasestudy forrice.Am.J.Bot.99,407–423

80.Yu,X.etal.(2016)Genomicpredictioncontributingtoapromising globalstrategytoturbochargegenebanks.Nat.Plants2,16150

81.Gorjanc,G.etal.(2016)Initiatingmaizepre-breedingprograms usinggenomicselectiontoharnesspolygenicvariationfrom landracepopulations.BMCGenomics17,30

References

Related documents

In summary, our data suggest that most organizations are using a hybrid approach when it comes to containers and the cloud, where some services remain on internal infrastructure,

Biphasic dose response of NF- κ B activation (measured by bioluminescence reporter assay) in mouse embryonic fibroblasts 10 hours after different fluences of 810-nm laser light.. CHI

Typically, affordable housing projects in Chicago use a mosaic of funding sources approved by the city council. Low-income housing tax credits available from both the State of

Although research uncovered several critical problems that children experience when using an ubiquitous keyword interface, such as Google [9], almost 80% of children

With the North West being particularly at risk of alcohol- related harm (PHE, 2016), more information is needed on parenting experiences and strategies

The Miami USGS maintains laboratory facilities, such as Field Service Units, mobile labs, and field vehicles for use in preparing equipment for field activities, processing

Thus, this researcher believes that this study had made a contribution to the literature on measuring IS success, and in particular extending the DeLone and McLean model to mobile

To make all COBie2-required IFC properties available for the Component elements in the element Settings dialog ( Tags and Categories > Manage IFC Properties ), use