Feature
Review
Genomic
Selection
in
Plant
Breeding:
Methods,
Models,
and
Perspectives
José
Crossa,
1,* Paulino
Pérez-Rodríguez,
2Jaime
Cuevas,
3Osval
Montesinos-López,
4Diego
Jarquín,
5Gustavo
de
los
Campos,
6Juan
Burgueño,
1Juan
M.
González-Camacho,
2Sergio
Pérez-Elizalde,
2Yoseph
Beyene,
1Susanne
Dreisigacker,
1Ravi
Singh,
1Xuecai
Zhang,
1Manje
Gowda,
1Manish
Roorkiwal,
7Jessica
Rutkoski,
8and
Rajeev
K.
Varshney
7,*
Genomic selection (GS) facilitates the rapid selection ofsuperior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles,andbasisofGSandgenomic-enabledprediction(GP)aswellasthe geneticsandstatistical complexitiesofGPmodels,includinggenomic geno-typeenvironment(GE)interactions.WealsoexaminetheaccuracyofGP modelsand methods fortwo cereal crops and two legumecrops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement(i.e.,prebreeding)programscould acceleratetheflowofgenes fromgene bank accessionsto elitelines. Recent advances in hyperspectral imagetechnologycouldbecombinedwithGSandpedigree-assistedbreeding. TheRole ofGenomic-Enabled PredictioninPlant Breeding
Beginningduringthe1980s,thedevelopmentofdifferentmolecularmarkersystemsdrastically increased the total number of polymorphic markers available to plant breeders, and to molecularbiologistsingeneral.Themostnotablehigh-throughputgenotyping(HTG)system issinglenucleotidepolymorphisms(SNPs),whichhavebeenusedintensivelyinquantitative traitlocus(QTL;see Glossary)discovery.More than 10000 QTLs usingdifferentmarker systemshavebeenreportedinmorethan120studiescovering12plantspecies[1]thataimed toimprovequantitativetraitsofeconomicimportance.Initially,molecularmarkerswere inte-gratedintraditionalphenotypicselection(PS)byapplyingmarker-assistedselection(MAS). Forsimpletraits,MAScomprisesselectingindividualswithQTL-associatedmarkersthathave majoreffects;markersnotsignificantlyassociatedwithatraitarenotused.However,attempts toimprovecomplexquantitativetraitsbyusingQTL-associatedmarkerdetectionhavebeen unsuccessfulduetothedifficultyoffindingthesameQTLacrossmultipleenvironments(dueto QTLenvironmentinteractions)orindifferentgeneticbackgrounds[2].
LinkageanalysisforQTLmappingisdoneonbiparentalpopulations,buthaslowpowerfor detectingmarker–traitassociationduetochromosomeswithlowrecombinationrates. There-fore,associationmappingstartedduringtheearly2000swiththeobjectiveofovercomingthe lowpower oflinkage analysis,thusfacilitating thedetectionof marker–trait associations in
Trends
Inrecentyears,theglobalclimatehas changed,resultingindrastic fluctua-tionsinrainfallpatternsandincreasing temperature.Suddenclimatechanges cancausesignificanteconomiclosses tocountriesworldwide.
Geneticimprovementofseveral eco-nomicallyimportantcropsduringthe 20thcenturyusingphenotypic, pedi-gree,andperformancedatawasvery successful. However, signs of grain yieldstagnationinsomecrops, espe-ciallyindrought-stressed and semi-aridregions,areevident.
Genomicselectionoffersthe opportu-nitytoincreasegrainproductioninless time.InternationalMaize and Wheat ImprovementCenter(CIMMYT)maize breeding research in Sub-Saharan Africa,India,andMexicohasshown thatgenomicselectioncanreducethe breedingintervalcycletoatleasthalf theconventionaltime andproduces linesthat,inhybridcombinations, sig-nificantly increasegrain yield perfor-mance over that of commercial checks.
Publicandprivateinvestmentincrop genomic selection research should increasetosuccessfullydevelopinless time germplasm that is adapted to suddenclimatechange.
nonbiparentalpopulationsandfine-mappingchromosomesegmentswithhighrecombination rates.However,themainproblemoffine-associationmappingisthelowpowerfordetecting rarevariantsthatmaybeassociatedwitheconomicallyimportanttraits[2].Thus,thechallenge ofassociationmappingandQTLdetectionresidesinidentifyingandquantifyingrareQTLswith smalleffects for economicallyimportant traits thatare highly affectedby theenvironment. However,becausethecostofSNPassayshasdramaticallydecreased,thepossibilityofusing high-densitySNP arrays(tensofthousands) hasresulted inthe developmentofstatistical modelstopredictmarker–traitassociationaccurately,dependingonthegeneticarchitectureof thepredictedtrait.
Contrary to QTL and association mapping,GS uses all molecular markers for GP of the performanceofthecandidatesforselection.Therefore,theaimofGSistopredictbreeding and/orgeneticvalues.GScombinesmolecularandphenotypicdatainatrainingpopulation (TRN)toobtainthegenomicestimatedbreedingvalues(GEBVs’)ofindividualsinatesting population(TST)thathavebeengenotypedbutnotphenotyped[3].Figure1Adepictsthetwo basicpopulationsinaGSprogram:theTRNdata,whosephenotypeandgenotypeareknown, andtheTSTdata,whosegeneticvaluesaretobepredicted.GSisusedinplaceofphenotyping forafew selectioncycles.Themainadvantages ofGS overphenotype-basedselection in breedingarethatitreducesthecostpercycleandthetimerequiredforvarietydevelopment.In termsofcostreductioninmaizebreeding,thebreedercantestcross50%ofallavailablelines, evaluatingtheminfirst-stagemulti-locationaltrials,andcanthenusethephenotypicdatato predictthe remaining 50% byGS. Figure 1B showsthe advantageof GS overPS for:(i) reducingcosts,upto50%;and(ii)savingtimebyselectinglinesdirectlyforstageIIinstead goingthroughstageI(usedinPS).Thissignificantlyreducesthecostoftestcrossformationand evaluationateachstageofmulti-locationevaluations.ThetimeefficiencyoverPScouldcome fromthesecondcycleofselection,whichusestheTRNfromthepreviouscycletopredictthe new doubled haploid (DH) lines, thus excluding testcross formation and first-stage multi-locationevaluationtrials.BasedonGS,thebestlinescouldgodirectlytothesecondstage ofmulti-locationevaluations.
GSpredictsthebreedingvalues(BVs)ofthecandidatesforselection.BVshavetwo compo-nents:the parentalaverage (the mean BV of both parents) and the deviation of progeny performancefromthisaveragethatisduetoMendeliansampling.Inconventionalbreeding,the parentalaverageisquantifiedbypedigreeinformation(ifthegenealogyisavailable),fromwhich arelationshipmatrixAbetweentheindividualscanbederived.Mendeliansamplingassesses within-familyvariationthatisquantifiedbytestingtheprogenyinmultienvironmentfieldtrials. GStakesadvantageofdensemarkerstoquantifyMendeliansampling,thusavoidingtheneed toextensivelyphenotypethe progeny.Thissavestimeby reducingthe cyclelength,while enhancingtheexpectedgeneticgainandtheselectionresponseperunittime;italsouses lessresourcescomparedwithextensivephenotyping.GShasthepotentialtoquicklyimprove complextraitswithlowheritabilityaswellastosignificantlyreducethecostoflineandhybrid development.GScanalsobeusedforsimpletraitswithhigherheritabilitythancomplextraits, forwhichhighGP accuracyisexpected.TheapplicationofGS inplant breedingcouldbe limitedbytwomainfactors:(i)genotypingcosts;and(ii)unclearguidelinesastowhereGScan beefficientlyappliedinabreedingprogram.
GSandGP havebeenappliedusingtwo differentapproaches.Onefocusesonpredicting additiveeffectsinearlygenerationsofabreedingprogram(F2:3)toachievearapidselection
cyclewithashortinterval(i.e.,GSattheF2levelofabiparentalcross).Inthiscase,researchers
are interested in predicting the additive values (BVs) rather than the total genetic value; therefore, additive linear models that summarize the effects of the markers are sufficient. The other approach predicts the complete genetic values of individuals considering both
1
InternationalMaizeandWheat ImprovementCenter(CIMMYT),Apdo. Postal6-641,06600,MexicoCity, Mexico
2
ColegiodePostgraduados, Montecillo,Texcoco,56230,Edo.de Mexico,Mexico
3
UniversidaddeQuintanaRoo, QuintanaRoo,77019,Mexico
4
FacultaddeTelemática,Universidad deColima,Colima,28040,Mexico
5
DepartmentofAgronomyand Horticulture,Universityof Nebraska-Lincoln,321KeimHall,Lincoln,NE 68503-0915,USA
6
DepartmentofEpidemiology& Biostatistics,MichiganState University,909FeeRoad,Room B601,EastLansing,MI48824,USA
7
InternationalCropsResearchInstitute fortheSemi-AridTropics(ICRISAT), Patancheru502324,Telangana,India
8
InternationalRiceResearchInstitute, LosBaños,4030,Philippines
*Correspondence:
[email protected](J.Crossa)and
Glossary
Breedingvalue(BV):theworthof anindividualasestimatedbythe averageperformanceofits progenies.
Array:high-throughputgenotyping platformcontainingeither genome-wideSNPs(SNParray)orcoding variants(exomearray). Geneticgain:theamountof increaseinperformancethatis achievedthroughgenetic improvementprogramsbetween consecutiveselectioncycles. Genomicestimatedbreeding value(GEBV):breedingvalueofan individualcalculatedusing genome-widemarkerdatatocapturesmall geneticeffectsdispersedoverthe genome.
Genomicselection(GS):atypeof breedingmethodologythatselects thebestcandidatesasparentsfor thenextselectioncyclebasedon theirpredictedbreedingvalues computedgiven:(i)theirgenotypes; and(ii)thephenotypesand genotypesoftheirrelatives. Genotyping-by-sequencing(GBS): ahighlymultiplexgenotypingsystem thatinvolvesDNAdigestionwith enzymesfollowedbyconstructionof areducedrepresentationlibrary, whichissequencedusinga next-generationsequencingplatform. Heteroticgroup:agroupof genotypes,irrespectiveoftheir geneticrelatedness,thatdisplay similarcombiningabilityandheterotic responsewhencrossedwith genotypesfromothergenetically distinctgermplasmofthesame group.
Linkagedisequilibrium(LD):the nonrandomassociationsofallelesat twolocithatcanoccurevenifthe twogenesareunlinked.
Marker-assistedselection(MAS): aprocessforselectingindividuals basedontrait-linkedmarkers. Marker-assistedrecurrent selection(MARS):atypeofMAS thatallowsthesimultaneous identificationofassociatedallelesas wellastheaccumulationofsuperior allelesfrombothparents. Quantitativetraitlocus(QTL):a gene(DNAsectionalsocalledlocus) thatisassociatedwithvariationina phenotype. (A) Phenotype TRN populaon TST populaon Genotype with genome-wide markers Develop a predicon equaon Predict breeding values for individuals in TST Genotype with genome-wide markers
TRN and TST populaons in genomic selecon
(B)
Phenotypic selecon Phenotypic + genomic selecon Year 1 1 2 2 3 3 4 4 5 5 6 6 B B B B B B A A A A A A A x B A x B F1 - inducer F1 - inducer Haploids Haploids Haploids Haploids
DH per se – seed increase
DH per se – seed increase
DH per se – seed increase + genotyping
DH per se – seed increase + genotyping DH for TCs – stage I
DH for TCs – stage I DH for TCs – stage II GS model
GS model 50% in cold room 50% DH for TCs – stage I
First stage mullocaon evaluaon of TCs
First stage mullocaon evaluaon of TCs
First-stage mullocaon evaluaon of TCs
Second-stage mullocaon evaluaon of TCs
Line C x line D Line C x line D
F1 - inducer F1 - inducer
Season
Figure1. PopulationsUsedinGenomicSelectionandaSchemeofPhenotypicandGenomicSelectionin MaizeBreeding.(A)Genomicselection(GS) requiresa trainingpopulation (TRN)thathas beengenotypedand phenotypedandatestingpopulation(TST)thathasonlybeengenotypedbutnotphenotyped.(B)Reductionofcycle lengththroughGSinmaizeusingdoubledhaploids(DH)crossedwithatester(TC).
additiveand nonadditive(dominanceand epistasis) effects, thereby estimating the perfor-mance (commercial value) of thecultivars. Geneticvalues oflines are predictedfor some environmentsusinganincomplete(sparse)multienvironmenttestingscheme.
Several genetic and statistical factors complicate the practical application of GP. Genetic difficultiesarisefromthesizeanddiversityoftheTRNpopulationandtheheritabilityofthetraits tobepredicted.Statisticalchallengesarerelatedtothehighdimensionalityofmarkerdata, wherethenumberofmarkers(p)ismuchlargerthanthenumberofobservations(n)(p>>n)and themulticolinearityamongmarkers(adjacentmarkersarehighlycorrelated).Formoredetails, see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoaninverseproblem’ inthesupplementaryinformationonline.
Here, we reviewadvances inGS and GP theory inlight of theabove considerations and evaluaterecent examples from GP applied to cerealand legume breeding programs.We describetheevolutionandmainfeaturesofGPmodels,includingcomplexities,strengths,and weaknesses.WethenillustratetheuseofGPusingexamplesfromcropbreedingprograms withgenomicGEinteractions,aswellasresultsofgeneticgainsfromrapidcycleGSin maize.WealsospeculateontheprospectsforGSandGPinplantbreeding.Mostoftheresults presentedinthisreviewincludestudiesperformedonmaizeandwheatfromtheInternational MaizeandWheatImprovementCenter(CIMMYT),aswellasonchickpeafromtheInternational CropsResearchInstitutefortheSemi-AridTropics(ICRISAT)
Genomic-EnabledPredictionModelsandApplications:Copingwith Complexity
ThecomplexityofapplyingGPinbreedingoccursatdifferentlevelsandisinfluencedbyseveral factors.Whenatraitisaffectedbyalargenumberofloci,GPaccuracydependsonseveral geneticfactors:(i)thesizeandgeneticdiversityoftheTRNpopulationanditsrelationshipwith theTST population[4];thatis,whetherthecultivars inthe TRNarerelatives(closeand/or distant)ofcultivarsintheTSTset;(ii)theheritabilityofthetrait(s)underselection[complextraits withlowheritabilityandsmallmarkereffectsaresuitableforGSandGP,whereaslesscomplex traits(withhighheritability)canbepredictedbyafewmarkerswithrelativelylargeeffects);and (iii)forcomplextraitswithlargenumbersofmarkersthatarenotinlinkagedisequilibrium(LD) withthe QTL, GP accuracy is lower [5]and increases when the heritability and TRNsize increase.StudieshaveshowntheimportanceofselectinganappropriateTRNpopulationthat optimizestheaccuracyofthepredictionsofthenonphenotypedcultivarsintheTSTset[6]. Dependingonthetrait,theincreaseinGPaccuracyreachesaplateauasthepopulationsize increases.Asimilartrendwasfoundforthenumberofmarkers[7,8].
Oneimportantgenetic-statistical complexityofGP modelsariseswhenpredicting nonphe-notypedindividualsinspecificenvironments(site–yearcombinations)byincorporatingGE interactionsintothestatisticalmodels.Equallyimportantisthegenomiccomplexityrelatedto GEinteractionsformulti-traits;theseinteractionscreatetraitandenvironmentalstructures that shouldbe dealt with byusing statistical-genetic models that exploitmulti-trait, multi-environmentvariance-covarianceandgenetic correlationsbetweenenvironments, between traits,and between traits and environments, simultaneously. Untanglingthe complexityof multi-traitgenomicsandmultipleenvironmentsrequiresatheoreticalframeworkthataccounts forthesecomplexinteractions[9](see‘Bayesianmulti-traitmultienvironmentgenomicmodel fornormalphenotypes’inthesupplementaryinformationonline).Interestingly,theuseofGPto improvediseaseresistancehasbeenchallenginginwheatfortworeasons:(i)selectionfor majorresistancegenescanbeephemeralduetochangesinpathogenraces;and(ii)breeding forminorresistancegeneswithsmalleffectsthroughoutGS(whichprovides durable resis-tance)mayfacetheusualcomplexitiesencounteredinGS[10].
AnotherlevelofcomplexityoccursinGSstatisticalpredictionmodelsbecausethenumberof markers (p) is larger than the population size (n) and the predictors (markers) are highly correlated. Thissituation results in a matrix of predictors that is rank deficient, making it impossibletocomputeleast-squareestimatesformarkereffects.Thecomplexityarisesfrom factorssuchasthecourseofdimensionality[11];thatis,undermodelswithp>>n,whichare notlikelihoodidentifiedandarepronetooverfitting,spuriousfeaturesanddatastructuresmay becaptured(see‘Thecomplexityofgenomicselectionandprediction’and‘Solutiontoan inverseproblem’inthesupplementaryinformationonline).Solutionstotheseproblemsinclude theuseof:(i)penalizedregression;(ii)variableselection;and(iii)dimensionalityreduction(e.g., principalcomponents),suchthatanewsetofpredictorsthatarenotcorrelatedisgenerated fromtheoriginalone(markers),thusallowingtheuseofunivariatedistributionsanddecreasing the computationtimeof the estimatesandthe prediction [12].A fourthsolution is to use statisticalmodelsthatassessGPcomplexitiesandhigh-densitymarkerplatformswithGE interactions,therebyaddingpowertotheGPmodels(seethenextsection).
GPmodelsbasedonbasicquantitativegeneticsdescribethephenotypicresponseasthesum ofageneticvalue(linearadditivemodels)andaresidualvalue.AlargebodyofGPresearchhas focusedondevelopingefficientparametricandnonparametricstatistical andcomputational models withincreasedaccuracy forpredicting nonphenotyped genotypes[13]. Ingeneral, thesetheoreticalstudiesshowreasonablygoodpredictionaccuraciesforcomplextraitssuch asgrainyieldandothertraitsevaluatedbymeansofindependentrandomcross-validationdata partitioning.IncontrasttothewidespreaduseofGPtopredicttheperformanceofonetraitin theTST populations usingdatafrom the sametraitobservedinthe TRNpopulations,the complexityofextendingthistomulti-traitGPindiceshasnotreceivedmuchattention,except foramethodproposedbyCerón-Rojasetal.[14]thatisbasedonthemulti-traitGenomicBest Linear Unbiased Estimator (GBLUP) selection index, which worked well when applied to simulatedandrealdatasets.
Withadvancesin GS and GP, datavolumes andcomplexity have increased dramatically, leading to novel interdisciplinary research efforts to integrate computer science, machine learning,mathematics, physics,statistics, geneticsandquantitative genetics,and bioinfor-matics.Suchworkhasemergedasanewfieldofresearch(commonlyknownas‘datascience’ ordata-drivenscience)thataimstounifystatisticswithdataanalysis,datamining,andsoon. Theinterdisciplinaryresearchersindatasciencefocusoncomputingmoreaccuratepredictive valuesbyusingstatisticalmodelsormachine-learningmodels(R.McDowell,MScthesis,Iowa State University,2016). Neural networkmethodsare commonprediction toolsinmachine learning.Neuralnetworkscompriselayersofinterconnectedneurons,wheretheoutputofeach neuronisexpressedasthesumofacertainnumberofinputstoaneuronlocatedinaspecific networklayer,withaweightplusabias;thesumofallinputsis weightedbyan activation function.
WhenneuralnetworksareappliedtoGP,theinputlayeriseachmarkerwithoneneuronper marker;eachoftheneurons(markers)intheinputlayerisconnectedtoalltheneuronsinthe firsthiddenlayer,andtheseareconnectedtoallneuronsinthesecondhiddenlayer,andsoon, uptotheoutputlayer,whichisonehiddenneuronlayerwiththepredictionofeachofthe phenotypes.Recentdevelopmentsinneuralnetworksandspeediercomputerprocessinghave allowedtheadditionofnewlayerstotheneuralnetwork(deepmachinelearning)tocapture smallcrypticcorrelationsbetweeninputs[15],whichinGPareinteractionsbetweenmarkers. Initialapplications of machine-learning and neural networks in GP were demonstrated by Gianolaetal.[16,17],González-Camachoetal.[18,19],Pérez-Rodríguezetal.[20],Ornella etal.,[21],andGonzález-Recioetal.[22].Recentresultsfordeepmachinelearningappliedto GPcanbefoundelsewhere(R.McDowell,MScthesis,IowaStateUniversity,2016).
TheAccuracyofGPModels,andGeneticGainsAchievedbyGS
TheGPmodelsassessdifferentpredictionproblemsthatattempttomimicwhathappenswhen predictionsaremadeinrealsituations.Differentrandomcross-validationschemeshavebeen designedtosimulatethepredictionproblemsthatresearchersmayfacewhenperformingGS. There are four basic scenarios arising from combinations of tested (observed) lines (LT), untested(unobserved)lines(LU)withtested(observed)environment(ET),oruntested (unob-served)environments (EU). Predicting newly developed lines (or cultivars) in environments wherethey were nottestedisa caseof LU-ET(randomcross-validation1, CV1).Another problemistopredictlinesinsomeenvironmentsbutnotinothers;thisisLT-ET(random cross-validation2,CV2),whichattemptstomimiconeoftheobjectivesofGP:sparsetesting.Another problemcomprisespredictinglinesinuntestedenvironments;thatis,LT-EU(random cross-validation0,CV0).Finally,thereistheproblemofpredictinglinesneverobservedin never-observedenvironments,LU-EU(cross-validation00,CV00)[23–27].
Simulationandempiricalresultsobtainedbyrandomcross-validationsuggestthatGS enhan-cesgeneticgainsbyshorteningthebreedingcycle(rapidselectioncycle)and/orenhancing testingefficiencyinfieldevaluations[3,28–31].Resultsofusingrandomcross-validationon maizeandwheatbreedingdataindicatethatGScansignificantlyenhancepredictionaccuracy relatedtopedigreeandMASforlow-heritabilitytraits[13,18–20,32–43].ResultsofapplyingGS inmaizeandwheatbreedingindicateitseffectivenessinselection[44–48].
BreedingprogramsworldwidehavebeenstudyingandapplyingGSandGPinseveralcrops.In parallel,extensiveresearchhasresultedinnovelstatisticalmethodsthatincorporatepedigree, genomic,andenvironmentalcovariates(e.g.,weatherdata)intostatistical-geneticprediction models. GBLUP models [49,50] are widely used in GP, and the extension of GBLUP for incorporatingGEinteractionshasimprovedtheaccuracyofpredictingunobservedcultivars in environments [23,24,51–56]. New models for assessing the GP accuracy of discrete responsevariables(e.g.,ordinaldiseasedata, suchas rates,countdata, andsoon)were proposed[57–61]togetherwithBayesian genomicmodelsfor analyzingmultipletraits and multipleenvironments.AcomputationallyefficientMarkovChainMonteCarlo(MCMC)method thatproducesfullconditionaldistributionsoftheparameters,leadingtoexactGibbssampling fortheposteriordistribution,hasalsobeendeveloped[9].Resultsfromsimulatedand(two) extensivedatasetsshowthat,whenthecorrelationbetweenthe traitsishigh,aproposed modelwithanunstructured covariancematrix ispreferredoverthe diagonalandstandard methodstohelpimprovethepredictionaccuracyforgrainyield.However,whencorrelations arelow,itisenoughtousethestandardmodel[9].
Depending onthe complexity of the traitand the prediction scenario, moresophisticated modelsresultinmoderate-to-highgainsinpredictionaccuracy.Inseveralstudies,complex models increased prediction accuracyby >10% at noadditional cost.The useof simple modelsmaymissimportantdatafeaturesandcauselossesofpredictionaccuracyforcomplex traits,wherenonlinearmodelsusuallygivesignificantlyhigherpredictionaccuracythanlinear models[20,36,55,56].OneofthefirstassessmentsofGPinwheatbreedingwasperformedon acollectionof599 wheatlinesevaluatedinfour environmentsusingpedigreeandgenomic informationwithtwodifferentmodels[34].ThemodelswerethestandardGBLUPwiththelinear genomicmatrix Gandanonparametricmodel,ReproducingKernelHilbert Spaces(RKHS) regressionwithanonlineargenomicmatrix,theGaussiankernel(GK)[62].Themostcomplex model,RKHSusingaGKnonlinearkernel,includingpedigreeandmarkerinformation,gavethe highestpredictionaccuracy,rangingfrom15%to36%,withrespecttothepedigreemodel alone.TheGP modelRKHSwithGK,pedigree,andmarkershasbeenusedforpredicting resistancetoleaf,stem,andstriperust,septoria,tanspot,andStagonosporanodorumblotch
inwheat;comparedwithstandard least-squaresmultipleregressionmethods,RKHSgave increasesinaccuracyof42%and48%inthetworeportedstudies[63,64],respectively.
Basedontheencouragingresultsobtainedinsomemajorcereals,preliminarystepshavebeen takentodeployGStodevelopsuperiorlinesmorequicklyandenhancetherateofgeneticgain inafewlegumecrops,suchaspea,soybean,chickpea,groundnut,andpigeonpea[65].The recentavailabilityofcost-effective,high-throughputsequencinghasfacilitatedthe develop-mentoflarge-scalegenomicresourcesinmostlegumes.Soybeanwasthefirstlegumecrop where GS was deployed for improving yield and agronomictraits using genotyping-by-sequencing(GBS)inabreedingprogram[25].ToassesstheutilityofGSinsoybeanbreeding programsandunderstandtheeffectofmarkerselectionandgenotypeimputation,prediction accuracieswerecalculatedfortwogenomicpredictionmodels,namely,thestandardGBLUP modelwithadditiveeffectsandanextendedGBLUPwithadditive-by-additiveepistasis.High predictionaccuracy(0.64)indicatedthepotentialofusingGStoimprovegrainyield[25].
Inthecaseofchickpea,acollectionof320elitebreedinglineswasgenotypedusingDiversity Arraytechnology(DArTseq)andphenotypedforyield-relatedtraitsintwoenvironmentswith twodifferenttreatments(i.e.,rainfedandirrigated)intwodifferentseasons[65,66].Various statistical models (RR-BLUP, Kinship GAUSS, BayesCp,Bayes B, Baysian LASSO, and randomforestregressionorRFR)resultedinhighpredictionaccuraciesforthetraitsofinterest; however,notmuchvariationinpredictionaccuracyamongthedifferentmodelswasobserved
[65,66].Whenpopulationstructurewasincludedinthemodel,predictionaccuraciesimproved slightlyfordaystomaturity(DM),daystoflowering(DF),andseeddryweight(SDW),butnotfor seedyield(SY).
Ingeneral,earlystatisticalmodelsdevelopedforGPinanimalbreedingwerebasedon single-environmentassessments.However,inplantbreeding,GEinteractionsareofparamount importance.JustasGEinteractionsareafundamentalchallengeinplantbreeding,theyare alsoincreasinglyrecognizedasamajorcomplexityinGPmodels.
GPIncorporatingGenotypeEnvironmentInteraction
Includinghigh-densitymarkerplatformswithGEinteractionsincreasestheaccuracyofGP models; this has been extensively studied in bread wheat, maize, and legumes [23– 27,55,56,67].In all GP models that incorporateGE interactions, accuracywithrespect tosingle-environmentanalysesincreased10–40%onaverageinallthreecropspecies.The main models used to assess GP accuracy by incorporating GE interactions and their applicationtorealdataaredescribedbelow.
MultienvironmenttrialsforassessingGEinteractionshaveanimportantroleinplantbreeding forselectinghigh-performingandstablelinesacrossenvironments.Burgueñoetal.[23]were thefirsttousemarker-andpedigree-basedGBLUPmodelsforassessingGEinteractions undergenomicprediction,whileHeslotetal.[52]incorporatedcrop-modelingdatatostudy genomicGEinteractions.Areactionnormmodel,wherethemainandinteractioneffectsof markersandenvironmentalcovariatesare introducedusing high-dimensionalrandom vari-ance-covariance structures of markers and environmental covariates, was developed by Jarquínetal.[24]asanextensionofthewell-knownGBLUPmodel.
Thebaselinemodelforphenotypesevaluatedindifferentenvironments(yij)canbedescribed
usingEquation1:
wheremistheoverallmean,Ei(i=1,...,I)istherandomeffectoftheithenvironment,Ljisthe
randomeffectofthejthline(j=1,...,J),EL
ijistheinteractionbetweentheithenvironmentand
thejth line,andeijistherandomerror term.Theassumptionsare asfollows:EiiidN 0;s2E
, Lj iid N 0;s2 L ,ELij iid N 0;s2 EL ,andeij iid N 0;s2 e
,withN(.,.)denotinganormaldistribution,and ‘iid’ standing for independent and identically distributed. However, when the number of environmentsissmall,it maybebettertoassumeit asafixedeffect.
MarkerscanbeintroducedinEquation1suchthattheeffectoftheline(Lj)canbereplacedbygj
expressedasalinearregressiononmarkercovariates(itapproximatesthegeneticvalueofthe jthline).Thevectorcontainingthe‘genomicvalues’isgN 0;Gs2
g
,wheres2
gisthegenomic
variance,andGisagenomicrelationshipmatrix.Also,theeffectoftheline(Lj)canbereplaced
byaj,withaN 0;As2a
,whereAisthenumericaladditiverelationshipmatrixderivedfrom pedigree,ands2
aistheadditivevariance.TheinteractioncovariancematrixistheHadamard
productoftwocovariancestructures,onedescribingrelationships betweenlinesbasedon geneticinformation(pedigreeorgenomic)andtheotherrelatingenvironmentsbymeansof environmentalcovariates.Whentheenvironmentaleffectisassumedasfixed,theinteraction termistheHadamardproductofthefixedeffectofenvironmentsandthecovariancematrixof linesthatwasbuiltwiththegeneticinformation.Whenenvironmentalcovariablesareused,the namedreactionnormisjustifiedbecausethegenotypiceffectisareactiontothose environ-mentalcovariables,whereas,whenenvironmentalcovariablesarenotused,thereactionnorm modelshaveunknownenvironmentaldeviations.
Thisreactionnormmodel[24]hasbeenappliedsuccessfully usingpedigreeandmolecular markersinmultienvironmentsaddingenvironmentalcovariates,forexample,incottontrials withenvironmentalcovariates[68],inGPofextensivewheatgenebankaccessions[69],inGP of Fe and Zn in wheat grain [70], in GP of bread wheat lines in sites locatedin diverse agroecologicalzones[27,25],inGPpredictionofwheatlinesevaluatedinMexicoandpredicted inlocationsinSouthAsia[71],andinGPofextensivefieldtrialsinwheatondifferentcontinents
[67].ThereactionnormmodelwasalsoextendedtoGEinteractionswithmaizeandwheat diseaseordinalandcountdatabyMontesinos-Lópezetal.[57–60](see‘Bayesian genomic-enabledpredictionmodelsforordinalandcountdataincorporatinggenotypeenvironment interaction’ inthe supplementary information online). The increase in GP accuracy of the reactionnormmodelwithGEinteractionswasonaverage7–20%relativetotheprediction accuracyoftheGBLUPwithoutincludingGEinteractions.
InareassuchasKansas,USA,wheatproductionisimpactedbyyearlyclimatefactors,suchas extremetemperaturesanderraticprecipitation.Yearlyeffectsarenotrepeatableandrepresent thedynamicpartoftheGEinteractions,whereassiteeffectsrepresentthestaticrepeatable component.The accuracyof predicting unobserved historical sites in the wheat breeding programofKansasStateUniversityreaches0.54,butunobservedyearscanbepredictedwith only0.17accuracy[26].Resultsofusingthereactionnormmodelwithpedigreeandgenomic informationforpredicting400–500breadwheatlinesinSouthAsianlocationsusing approxi-mately 60000 lines trained in different Mexican environments indicated that GP is more accurate (0.35) than PS for predicting unobserved wheat lines in different South Asian locations(0.20)[71];perhaps thislastresultisthesimplestproofofthe conceptthatGP worksbetterthanPS.
GPIncorporatingMarkerEnvironmentInteractionModels
TheGEinteraction modeldescribedby López-Cruzetal. [53] decomposes themarker effectsintocomponentsthatarecommonacrossenvironments(stability)and environment-specific deviations(interaction) [see‘The GE (orME) model withlinear kernel’inthe supplementaryinformationonline].Thismodelborrowsinformationfromacrossenvironments
whileallowingmarker effectsto changeineach environment.Itcanbeimplemented using shrinkagemethodsaswellasvariableselectionmethodsand,thus,canbeusedtoidentify genomicregionswhose effectsare stableacrossenvironmentsandotherregions thatare responsibleforGEinteractions[54].TheGEinteractionmodelofLópez-Cruzetal.[53]is bestsuitedforthejointanalysisofpositivelycorrelatedenvironmentsandwasusedtoanalyze threeCIMMYTwheatdatasets.ThepredictionaccuracyoftheGEinteractionmodelwas greaterthanacross-environmentorsingle-environmentanalyses(5–29%whenpredictingeach ofthe environments).Recently,this genomicGEinteraction modelwas usedto predict untesteddurumwheatlinesinenvironments,aswellasavariableselectionmodeltoidentify genomicregionswhoseeffectsarestableacrossenvironmentsandothersthatare environ-mentspecific[54].
InthemodelsofJarquínetal.[24]andLópez-Cruzetal.[53],thekernelusedisthelinearkernel GBLUP.Inanewstudy,Cuevasetal.[55]proposedaGEinteractionmodelsimilartothatof López-Cruzetal.[53]butwithanonlinearkernel,theGaussiankernel(GK),whichissimilartothat usedintheRKHS [62][see‘TheGEmodelofLópez-Cruzetal.(2015)withanon-linear Gaussiankernelmethod’inthesupplementaryinformationonline].Usingtwoextensivedatasets, theauthorsfoundthat,forthewheatdatasets,theGKgavepredictionaccuraciesupto17% higherthantheGBLUPlinearkernel.Forthemaizedataset,theGKwasonaverage5–6%higher thantheGBLUPlinearkernel.TheadvantageoftheGKovertheGBLUPisthatitisamoreflexible kernelthataccountsforsmallandcomplexmarkermaineffectsandspecificinteractions. OneweaknessofbothGEinteractionmodelswithalinear[53]andnonlinearkernel[55]is thatpositivecorrelationsbetweenenvironmentsareassumed.However,whenthecorrelation between environments is low or negative, these models do not increase the prediction accuracyofenvironmentswithnegligibleornegativecorrelations.ThestrengthoftheBayesian modelswithGEinteractionsusedunderthe linearkernelGBLUPorunderthenonlinear Gaussian kernel (GK) is that they overcome the limitation of the previous models when associations betweenenvironments arenegligible ornegative, as shown by Cuevas etal.
[56].TheseauthorsproposedconsideringthegeneticeffectsðuÞdescribedbytheKronecker productofvariance-covariancematricesofgeneticcorrelationsbetweenenvironmentsand genomic kernels through markers under two linear kernel methods: linear (GBLUP) and Gaussian(GK).Anextension includesthesamegeneticcomponent asthefirstmodelðuÞ, plusanextraresidualgeneticcomponent,f,whichcapturesrandomeffectsbetween environ-mentsthatwerenotaccountedforbytherandomeffectsu(see‘Multienvironment genoty-peenvironment interactionmodelwithlinear andnonlinearkernels’inthesupplementary informationonline).Resultsoftheanalysesoffivedatasetsshowedthat:(i)GEinteraction modelsalwayshadsignificantlyhigherpredictionaccuracythansingle-environmentmodels; and(ii)thepredictionaccuracyofGEmodelswithuandfoverthemultienvironmentmodel withonlyuwashigher85%ofthetimewithGBLUPand45%ofthetimewithGKacrossthe five data sets.Results indicated that including the random effect f was still beneficial for increasingGP accuracyafteradjusting fortherandomeffectu.Predictionaccuracyofthe GEinteractionmodelmethods(GBLUPorGK)increasedupto85%overtheaccuracyof thesingle-environmentmodel.
MachineLearningforGenomicPrediction
InanappliedGScontext,thefocusshouldnotbeonpredictingallindividuals,butratheron classifyingindividualsintoupper,middle,orlowerclasses,dependingonthetraitunderselection. UsingclassifiersinGSisattractivebecausetheyaretrainedtomaximizetheprobabilityofan individualbeingamember ofthetargetclass,ratherthansearchingforitsoverall performance[21]. González-Camachoetal.[18]usedaradialbasisneuralnetworkonanextensivegenomicmaize datasetcomprisingseveraltraitsindifferentenvironmentsandcomparedtheresultswithRKHS
andwithalinearregressionmodel.Theaccuracyofneuralnetworkmethodswassimilartothatof RKHSandslightlyhigherthanthatofthelinearregression.
Inarecentstudy,twoneuralnetworkclassifiers(amultilayerperceptron,MLP,anda probabi-listic neural network, PNN) were compared for predicting the probability of an individual memberofatargetphenotypicclass,using33maizeandwheatgenomicandphenotypic datasets[19].Theauthorsfocusedonthe15thand30thpercentilesoftheupperandlower classestoselectthebestindividuals,ascommonlydoneinGS(fortraitssuchasgrainyield,the upperclassesarethetarget;fordiseases,thefocusisonthelowerclasses).Thecriterionfor assessing the prediction accuracy of MLP and PNN was the area under the receiver operatingcharacteristiccurve(AUC). Theparametersofbothclassifierswereestimated byoptimizingtheAUCforaspecifictargetclass.ThePNNwasfoundtobemoreaccuratethan theclassifierMLPforassigningmaizeandwheatlinestothecorrectupper,middle,orlower class.Resultsforthewheatdatasetwithcontinuoustraitssplitintotwo andthree classes showedthattheperformanceofPNNwiththreeclasseswasbetterthanPNNwithtwoclasses whenclassifyingindividualsintotheupperandlower(15%or30%)categories.Dependingon themaizetrait–environmentcombination,AUCforPNN30%orforPNN15%uppertrait(grain yield)washigherthantheAUCofMLP.Forthelowerclass,flowering(maleandfemale)traits, PNN15%andPNN30%alwayshadbetterAUCthanMLP15%andMLP30%[19].
GeneticGainsfromRapidSelectionCycleGS:CIMMYTMaizeBiparentalPopulations
ThefundamentalgoalofusingGSinbreedingistoachievegreatergeneticgainsatalowercost andinlesstimethanwithconventionalpedigreebreeding.Toachieveashorterintervalcycle,a favorableuseofGSispredictionwithinfull-sibfamilies,becausebiparentalpopulationshave highLDbetweenmarkerallelesandtraitalleleswithnogroupstructure.
TherearefewstudiesthatmeasurethegeneticgainsachievedthroughuseofaGS-based rapidselectioncycle.ThefirststudyconfirmingthepromiseofarapidselectioncycleinGPof biparentalpopulations,aswellaspreviousfindingsfromrandomcross-validationstudies,was conductedbyMassmanetal.[44]andshowedthatGSimprovedmaizegeneticgainsperunit oftime.GeneticgainswerealsoreportedbyAsoroetal.[45]inoatandbyRutkoskietal.[48]in wheat,whichshowedthatGSandPSprovidedsimilarrealizedgeneticgainsperunitoftime.
GeneticgainstudiescomparingGScyclesC0,C1,C2,andC3withpedigreeselectiononeight
CIMMYTtropical biparental maize populations in Sub-Saharan Africa were conducted by Beyeneetal.[47] underdrought conditions.Theauthorsshowedthat:(i)theaveragegain percycle(acrossalleightbiparentalpopulations) fromGS was0.086t/haunder managed droughtconditions;(ii)theaveragegrain yieldofC3-derivedhybridswassignificantlyhigher
thanthatofhybridsderivedfromC0;and(iii)threeGScyclescanbeachievedin1year.The
authorsconcludedthathybridsderivedfromC3produced7.3%highergrainyieldthanthose
developedthroughconventionalpedigreebreeding.Bycontrast,theaveragegainpercycle usingmarker-assistedrecurrentselection(MARS)acrosstenpopulationswas0.051t/ha percycle under managed drought stress. Extensive field trialswere conducted inseveral manageddroughtenvironmentsinSub-SaharanAfricatoevaluatethegrainyieldperformance ofmaizehybridsderivedfromGS-basedlinesselectedfromdifferentpopulations.Linesderived fromcycleC3GSinhybridcombinationproducedsignificantlyhigheraveragegrainyieldthan
linesfromC0GSinhybridcombination(Y.Beyene,2017).
Anotherexampleofgeneticgainsfromrapid-cycleGSonCIMMYTmaizeistwo biparental maizepopulations(F2:3)fromAsia(CAP1andCAP2)thatweredevelopedandevaluatedfor
testcrossperformanceunderdroughtandoptimalconditions[72].Thegeneticgainsperyear forPSversusGSindroughtenvironmentswere0.067t/haversus0.124t/ha,respectively,for
CAP1,and0.076t/haversus0.104t/ha,respectively,forCAP2.Thecorrespondinggenetic gainsperyearforPSversusGSinoptimalenvironmentswere0.084t/haversus0.140t/ha, respectively,forCAP1,and0.123t/haversus0.13t/ha,respectively,forCAP2.Resultsofthis studyconfirmedthatGSofsuperiorplantphenotypesproducedrapidgeneticgainsindrought toleranceinmaize.
GeneticGainsfromRapidSelectionCycleGS:CIMMYTMultiparentalPopulations
Most GS results in maize have been achievedby rapid cycling of biparental populations (e.g.,F2:3segregatingpopulationscrossedwithatesterfromtheoppositeheteroticgroup).
Fiveyearsago,theGlobalMaizeProgramofCIMMYTdesignedaGSrapidcycleof multi-parentalcrosses.Fifteenelitetropicalmaizelineswerecrossedindiallelicfashiontoformcycle 0(C0),whichwasgenotypedusingGBSmarkersandphenotypedattwolocationsinMexico;
plantswiththebestphenotypewereselectedtoformtheparentsforGScycle1(C1).TheC1
parentswereintercrossedandtheprogenygenotypedwiththesameGBSmarkersusedfor theC0population[73].
Twocyclesperyearwerecompletedand,attheendofthesecondyear,seedsfromcyclesC0,
C1,C2,C3,andC4werecollected,assembled,andsown attwolocationsinMexico.Fifty
entries per genomic cycle were sown at each location, together with two widely used commercialtropical maizehybrids.Average genomicgrain yieldgainsreached 0.134t/ha, withC0producing6.653t/ha.GrainyieldofC1wasslightlylower(6.488),andcyclesC2,C3,
andC4producedmeanyieldsof7.022,6.879,and7.126t/ha,respectively.Therealizedgrain
yieldfromC1toC4reached0.225t/hapercycle,whichtheauthorsconsideredequivalentto
0.1t/haperyearoverabreedingperiodof4.5years.Aslightdeclineingeneticdiversitywas detectedatC4comparedwithC0.
ProspectsforEnhancedUseofGSinPlant Breeding
ToacceleratethedeploymentofGSincropbreedingwhilereducingthecostoflineandhybrid development,hereweexaminethecombineduseofGSwithhighthroughputphenotype(HTP) forearly-generationtestinginplantbreeding.Inaddition,weexaminetheapplicationofGSin germplasmenhancementandprebreedingusinggenebankaccessions.
CombiningMulti-TraitMultienvironmentGSwithHigh-ThroughputPhenotyping:The
CIMMYTCase
Inmodernagriculture,high-resolutioncamerasareusedtoobtainhundredsofreflectancedata measuredatdiscretenarrowbands(wavelengths)tocoverthewholespectrum.This informa-tionisusedtoconstructvegetationindices(e.g.,GreenNormalizedDifferenceVegetationIndex orNDVI)topredictprimarytraits(e.g.,grainyield).However,theseindicesuseonly certain bandsandarecultivarspecific;thus,theyfailtocaptureconsiderableinformationorperform robustlyforallcultivars.Theadvantageofthisimagingtechnologyisthatmassivenumbersof phenotypescanbescreenedinexpensivelyduringearly-generationtesting.
ThemainobjectiveofGSisto reducephenotypingcosts byusingmarkersandaccelerate geneticgains,whereastheaimofHTPistomeasurehigh-densityphenotypesinverylarge numbersof individualsorbreeding linesacrosstime andspace using remoteorproximal sensing.Thiscanincreaseboththeaccuracyandintensityofselectionand,subsequently,the selection response, while decreasingphenotyping costs. Themain idea of HTP is to use secondarytraitsrelatedtograinyield,diseaseresistance,orend-usequalitythatmaybeuseful inearly-generationtestingoflines.Arecentstudy[74]foundthatthehighestaccuracywhen predictinggrainyieldisachievedbytheuseofbroaderselectionofwavelengths(see‘Predicting grainyieldusingcanopyhyperspectralreflectanceinwheatbreedingdata’inthe supplemen-taryinformationonline).
UsingHTPplatformswithvegetationindicesaspredictortraitsinpedigreeandGSmodelscan increasethepredictionaccuracyforgrainyield[75].Predictionduringearly-generationtestingis important to enhancegenetic gains during early stages of the breedingcycle, but GS is economicallyunfeasibleatthosestagesduetothelargenumberofplantsinthefield;thus, whileassessingpedigreerelationshipshastheadvantageofnotcostinganything,theirusefails toexploitMendeliansampling,incontrasttoGS.Therefore,usingmultivariatepedigree-based predictionmodelsthatincorporatesuchpredictortraitswhileusinganHTPplatformoffersa low-costsolutionforpredictinggrainyieldamongandwithinfamiliesinearly-generationtesting, asdemonstratedbyRutkoskietal.[75].Theseauthorsfoundthatwithin-environment sec-ondaryvegetativeindicesincreasedpredictionaccuraciesforgrainyieldby59%usingpedigree relationshipsandby70%usinggenomicrelationships.Theseresultsindicatethatsecondary traitsmeasuredbyHTPandusedwithpedigreecanimprovethepredictionaccuracyofprimary traits during the early stages of breeding. Reynolds and Langridge [76] found that HTP techniquesareeffectiveforevaluatinggeneticresourcesforcomplextraitexpression.
HTPplatformsareusefultomeasuresecondarytraitsthataregeneticallycorrelatedwithgrain yieldandthatcanbeincorporatedinmultivariatepedigreeandgenomicpredictionmodels, improvingindirectselectionforgrainyield.Sunetal.[77]usedastatisticalmodelthatestimated BVsofsecondarytraitstogetherwithmultivariatepedigreeandgenomicmodels;theauthors founda70%(onaverage)improvementintheaccuracyofselectionforgrainyield,including secondarytraitsinbothTRNandTSTpopulations.
Arecentstudydevelopedstatisticalmodels toassesshyperspectralwavelength environ-ment interactions in HTP, incorporating genomic and pedigree GE interactions [78]. AlthoughlittleGPaccuracywasachieved,importanthyperspectralwavelengthenvironment interactionswereobserved,demonstratingthatGScoupledwithHTPcanbeapowerfultool appliedtoearly-generationtestingofalargenumberofselectioncandidates.Thefull condi-tionaldistributionsformodelingthethree-waytraitGEinteractionmodelof Montesinos-Lópezetal.[9]canbeadaptedtoincludethehyperspectralbandsinthefunctionalregression approachrecentlydescribedbyMontesinos-Lópezetal.[74,78].
ExploringtheApplicationofGStoGeneBankAccessionsforGermplasmEnhancement
Althoughtheaccessionsstoredingenebanksrepresentarichresourceforbreeders,alleles needtobeextractedfromtheaccessionsforcultivardevelopment,whichistime-consuming and expensive [79,80]. Lengthy prebreeding programs are requiredto develop lines that combinefavorableallelesfromthegermplasmbankwithgoodagronomicperformanceand that can be used as parents in a breeding program. Based on the simulation of various prebreedingoptions, germplasm enhancement breeding programs can start directly from landracesorfromlandracescrossedwithelitetesters[81].
The GP accuracy of 8416 Mexican wheat landrace accessions and 2403 Iranian wheat landraceaccessions from the CIMMYT gene bank were examined by Crossa et al. [69]. Theauthorsmeasuredtwotraitsintwoenvironmentsandseveralhighlyheritabletraitsina singleoptimumenvironment.TheGPaccuracyforseveraltraits(maturity,qualitytraits,and grainyieldandyieldcomponents)underthedifferentpredictionscenarioswashigh,ranging from0.5to0.7.Insoybean,Jarquínetal.[25]analyzedtheUSDAsoybeancollectionusing differentcross-validationschemesandgroupingfactors(trials,states,andgenetic subpopu-lations); results showed relatively high prediction accuracies that should help breeders to introgressusefulgeneticvariation(Box1).
ThesepreliminaryresultsoftheaccuracyofGPofgenebankaccessionsfavortheideaof applyingGStointrogresslandraceaccessionsinelitegermplasmandformgenepoolsand
populations suitable for prebreeding and germplasm enhancement programs. However, furtherresearchisrequiredinthisarea.
ConcludingRemarks
Manystatistical methodshave beendevelopedto predictunobservedindividualsinGS.In general,linearmodels(e.g.,GBLUP)andmachine-learningalgorithmshavebeensuccessfulin recognizing complexpatterns andmaking correctdecisions based ondata. Kernel-based methods,suchastheRKHS,haveextensivelydeliveredgoodgenomicpredictionsinplants. SeveralstatisticalmodelsbasedonthestandardGBLUPthatincorporateGEinteractionsin genomicand pedigreepredictions have providedsubstantial increases inthe accuracyof predicting unobserved individuals in environments. These GS prediction models can help scientistsindifferentdisciplinesto developdrought- andheat-tolerant plantsby exploiting positiveGEinteractions.Modelingmulti-traitmulti-environmentisessentialforimprovingthe predictionaccuracyoftheperformanceofnewlydevelopedlinesinfutureyears.
TheuseofstatisticalmodelsinextensivehyperspectralimagetechnologyforHTP,together withgenomicand pedigreeinformation inearly-generationtesting,offers anopportunityto accelerate genetic gains by increasing the intensity of selection. Deep machine-learning methodsusing neuralnetworking appearpromisingto increase the accuracyof genomic-enabledprediction.Genomicselectionhasaclear-cutadvantageoverpedigreebreedingand MAStoenhancegeneticgainsforcomplextraits.Theappropriateuseofgenotypingplatforms combinedwithprecisephenotypingplatformswillalsohelpenhancepredictionaccuracyand accelerategenetic gainsbyshortening thebreeding cycle. Further researchis requiredto incorporateGSwithHTPasaroutinecomponentinplantbreedingprograms.
Developing GP models for gene bankaccessionswill be important to access unexplored diversityandfast-trackusefulportionsintobreedingprograms(alsoseeOutstanding Ques-tions).Currently,GSisthemost-promisingbreedingmethodtospeedthedevelopmentand releaseofnewgenotypes;therefore,theuseofGStoformgenepoolsandpopulationsfrom richgenebankaccessionsmeritsextensiveandintensivestudy,especiallygiventhe vulnera-bilityofelitelinesandhybridstosevereclimatechangeeffects.
Box1.AnExampleofGSDeploymentinPrebreeding
ResultsfromusingalargenumberofMexicanandIranianlandracesstoredinthewheatgenebankofCIMMYTindicated thatpredictionaccuraciesfortraitsevaluatedinoneenvironmentforTRN20-TST80designrangedfrom0.407to0.677 forMexicanlandracesandfrom0.166to0.662forIranianlandraces.Also,predictionaccuraciesofthe20%coreset weresimilartothoseobtainedforTRN20-TST80,rangingfrom0.412to0.654forMexicanlandracesandfrom0.182to 0.647forIranianlandraces.Interestingly,thecorrelationsforcomplexGYSMtraitswereapproximately0.4forboth MexicanandIranianlandraces.ResultsofcorrelationswhenincorporatingGEinteractionsfordaystoheadingand daystomaturityforTRN20-TST80wereapproximately0.61forMexicanlandracesand0.60forIranianlandraces.The 20%coresethadcorrelationsofapproximately0.58forMexicanlandracesand0.50–0.59forIranianlandraces.
Predictionaccuraciesofthe10%and20%diversityandpredictivecoresubsetsweregenerallyofamagnitudethat appearedusefulforpredictingthevalueofgenebankaccessionsandforbreeding.Thefirstapplicationwouldbeto predictthegeneticvalueofallgenotypedaccessionsinagenebankandthenphenotypeonlythosethathavethe highestpredictedgeneticvalues.Then,abreedercouldbeginprebreedingfollowingseveralstrategies.Forexample, onedecisionwouldbe toinitiate a prebreedingconversionapproachbycrossingthe bestaccessionsamong themselvestoimprovetheaccessionsperseuntiltheybecomeelite.Anotherstrategycouldbetostartanintrogression approachbycrossingselectedaccessionstoelitematerials.
Nevertheless,applicationofGSforgermplasmenhancementwillhavetobeperformedincombinationwithstandard introgressionofexotictoelitegermplasmand,possibly,aseriesofbackcrossestotheelitematerial.Extractingthebest accessionsdirectlyfromthegenebanksandformingproductivegenepoolsmaybethefirststagebeforerefiningthe genepoolsandevaluatingthemunderdifferentenvironmentalconditions.
OutstandingQuestions
HowcanGShaveanimportantrolein enhancingtherateofgeneticgain? Are currently available GS models capableofprovidingthemost accu-ratepredictions?
Howcan GS acceleratethe flowof favorableallelesdirectlyfromthegene banktobreedingprograms? Are weready todeployGS in pre-breeding and germplasm-enhance-mentprograms?
IsitpossibletocombineGSand pedi-greeselectionwithHTPtoaccelerate genetic gains and save resources when developing lines during early-generationtesting?
Acknowledgments
Theauthorsaregratefultotheircolleagues,collaborators,andbyusingfieldandlabtechnicianfromCIMMYTand ICRISAT,aswellasscientistsinNationalProgramswhocollectedthevaluabledatausedinthevariousstudies.The authorswouldliketothankKevinPixley,DanielGianola,andfouranonymousreviewersfortheirpositive,careful,and detailedreviewsofthemanuscript;theircomments,editing,andsuggestionssignificantlyimprovedthequalityand readabilityofthemanuscript.TheauthorsalsothankMichaelListmanforeditingtherevisedversionofthemanuscript.
SupplementalInformation
Supplementalinformationassociatedwiththisarticlecanbefound,intheonlineversion,athttp://dx.doi.org/10.1016/j. tplants.2017.08.011.
References
1. Bernardo,R.(2008)Molecularmarkersandselectionforcomplex traitsinplants:learningfromthelast20years.CropSci.48, 1649–1664
2. Bernardo,R.(2016)BandwagonsI,too,haveknown.Theor. Appl.Genet.129,2323–2332
3. Meuwissen,T.H.E.etal.(2001)Predictionoftotalgeneticvalue usinggenome-widedensemarkermaps.Genetics157, 1819–1829
4. Pszczola,M.etal.(2012)Reliabilityofdirectgenomicvaluesfor animalswithdifferentrelationshipswithinandtothereference population.J.DairySci.95,389–400
5. Daetwyler,H.D.etal.(2010)Theimpactofgeneticarchitectureon genome-wideevaluationmethods.Genetics185,1021–1031
6. Isidro,J.etal.(2015)Trainingsetoptimizationunderpopulation structureingenomicselection.Theor.Appl.Genet.2015,145–158
7. Lorenz,A.J.etal.(2012)Potentialandoptimizationofgenomic selectionforFusariumheadblightresistanceinsix-rowbarley. CropSci.52,1609–1621
8. Arruda,M.P.etal.(2015)Genomicselectionforpredicting fusar-iumheadblightresistanceinawheatbreedingprogram.Plant Genome8,1–12
9. Montesinos-López,O.A.etal.(2016)AgenomicBayesian multi-traitandmulti-environmentmodel.G36,2725–2744
10.Poland,J.andRutkoski,J.(2016)Advancesandchallengesin genomicselectionfordiseaseresistance.Annu.Rev. Phytopa-thol.54,79–98
11.Bellman,R.E.(1961)AdaptiveControlProcesses:AGuidedTour, PrincetonUniversityPress
12.Cuevas,J.etal.(2014)Bayesiangenomic-enabledpredictionas aninverseproblem.G34,1191–2001
13.delosCampos,G.etal.(2012)Wholegenomeregressionand predictionmethodsappliedtoplantandanimalbreeding. Genet-ics193,327–345
14.Cerón-Rojas,J.J.etal.(2015)Agenomicselectionindexapplied tosimulatedandrealdata.G35,2155–2164
15.LeCun,Y.etal.(2015)Deeplearning.Nature521,436–444
16.Gianola,D.etal.(2006)Genomic-assistedpredictionofgenetic valueswithasemi-parametricprocedure.Genetics173, 1761–1776
17.Gianola,D.etal.(2011)Predictingcomplexquantitativetraitswith Bayesianneuralnetworks:acasestudywithJerseycowsand wheat.BMCGenet.12,87
18.González-Camacho,J.M.etal.(2012)Genome-enabled predic-tionofgeneticvaluesusingRadialBasisFunctionNeural Net-works.Theor.Appl.Genet.125,759–771
19.González-Camacho,J.M.etal.(2016)Genome-enabled predic-tionusingprobabilisticneuralnetworkclassifiers.BMCGenomics 17,208
20.Pérez-Rodríguez,P.etal.(2012)Acomparisonbetweenlinear andnon-parametricregressionmodelsforgenome-enabled pre-dictioninwheat.G3:Genes|Genomes|Genetics2,1595–1605
21.Ornella,L.etal.(2014)Genomic-enabledpredictionwithclassi fi-cationalgorithms.Heredity112,616–626
22.Gonzalez-Recio,O.etal.(2014)Machinelearningmethodsand predictiveabilitymetricsforgenome-widepredictionofcomplex traits.Livest.Sci.166,217–231
23.Burgueño,J.etal.(2012)Genomicpredictionofbreedingvalues whenmodelinggenotypeenvironmentinteractionusing pedi-greeanddensemolecularmarkers.CropSci.52,707–719
24.Jarquín,D.etal.(2014)Areactionnormmodelforgenomic selectionusinghigh-dimensional genomicandenvironmental data.Theor.Appl.Genet.127,595–607
25.Jarquín,D.etal.(2016)Genotypingbysequencingforgenomic predictioninasoybeanbreedingpopulation.BMCGenomics15, 740
26.Jarquín,D.etal.(2017)Increasinggenomic-enabledprediction
accuracybymodeling genotypeenvironment interactionin
Kansaswheat.PlantGenomePublishedonlineJune8,2017.
http://dx.doi.org/10.3835/plantgenome2016.12.0130
27.Saint-Pierre,C.etal.(2016)Genomicpredictionmodelsforgrain yieldofspringbreadwheatindiverseagro-ecologicalzones.Sci. Rep.6,27312
28.Bernardo,R.andYu,J.M.(2007)Prospectsforgenome-wide selectionforquantitativetraitsinmaize.CropSci.47,1082–1090
29.Lorenzana,R.E.andBernardo,R.(2009)Accuracyofgenotypic valuepredictionsformarker-basedselectioninbiparentalplant populations.Theor.Appl.Genet.120,151–161
30.Heffner,E.L.etal.(2009)Genomicselectionforcrop improve-ment.CropSci.49,1–12
31.Heffner,E.L.etal.(2010)Plantbreedingwithgenomicselection: gainperunittimeandcost.CropSci.50,1681–1690
32.delosCampos,G.etal.(2009)Predictingquantitativetraitswith regressionmodelsfordensemolecularmarkersandpedigree. Genetics182,375–385
33.delosCampos,G.et al.(2010)Semi-parametric genomic-enabledpredictionofgeneticvaluesusingreproducingkernel Hilbertspacesmethods.Genet.Res.92,295–308
34.Crossa,J.etal.(2010)Predictionofgeneticvaluesofquantitative traitsinplantbreedingusingpedigreeandmolecularmarkers. Genetics186,713–724
35.Crossa,J.etal.(2011)Genomicselectionandpredictioninplant breeding.J.CropImprov.25, 239–226
36.Crossa,J.etal.(2013)Genomicpredictioninmaizebreeding populations with genotyping-by-sequencing. G3: Genes| Genomes|Genetics3,1903–1926
37.Crossa,J.etal.(2014)GenomicpredictioninCIMMYTmaizeand wheatbreedingprograms.Heredity112,48–60
38.Pérez-Rodríguez,P.anddelosCampos,G.(2014) Genome-wideregressionandpredictionwiththeBGLRstatistical pack-age.Genetics198,483–495
39.Hickey,J.M.etal.(2012)Factorsaffectingtheaccuracyof geno-typeimputationinpopulationsfromseveralmaizebreeding pro-grams.CropSci.52,654–663
40.Riedelsheimer,C.etal.(2012)Genomicandmetabolicprediction ofcomplexheterotictraitsinhybridmaize.Nat.Genet.44, 217–220
41.Zhao,Y.etal.(2012)AccuracyofgenomicselectioninEuropean maizeelitebreedingpopulations.Theor.Appl.Genet.124, 769–776
42.Windhausen,V.S.etal.(2012)Effectivenessofgenomic predic-tionofmaizehybridperformanceindifferentbreedingpopulations andenvironments.G3:Genes|Genomes|Genetics2,1427–1436
43.Technow,F.etal.(2013)Genomicpredictionofnortherncornleaf blightresistanceinmaizewithcombinedorseparatedtraining setsforheteroticgroups.G3:Genes|Genomes|Genetics3, 197–203
44.Massman,J.M.et al.(2013) Genome-wideselectionversus marker-assistedrecurrentselectiontoimprovegrainyieldand stover-qualitytraitsforcellulosicethanolinmaize.CropSci.53, 58–66
45.Asoro,F.G.etal.(2013)Genomic,marker-assisted,and pedi-gree-BLUPselectionmethodsforb-glucanconcentrationinelite oat.CropSci.53,1894–1906
46.Combs,E.andBernardo,R.(2013)Genome-wideselectionto introgress semidwarf corn germplasm into U.S. Corn Belt inbreds.CropSci.53,1427–1436
47.Beyene,Y.etal.(2015)Geneticgainsingrainyieldthrough genomicselectionineightbi-parentalmaizepopulationsunder droughtstress.CropSci.55,154c163
48.Rutkoski,J.etal.(2015)Geneticgainfromphenotypicand genomicselectionforquantitativeresistancetostemrustof wheat.PlantGenome8,1–10
49.VanRaden,P.M.(2007)Genomicmeasuresofrelationshipand inbreeding.InterbullAnnu.Meet.Proc.37,33–36
50.VanRaden,P.M.(2008)Efficientmethodstocomputegenomic predictions.J.DairySci.91,4414–4423
51.Heslot,N.etal.(2012)Genomicselectioninplantbreeding:a comparisonofmodels.CropSci.52,146–160
52.Heslot,N.etal.(2014)Integratingenvironmentalcovariatesand cropmodelingintothegenomicselectionframeworktopredict genotypebyenvironmentinteractions.Theor.Appl.Genet.127, 463–480
53.López-Cruz,M.etal.(2015)Increasedpredictionaccuracyin wheatbreedingtrialsusingamarkerenvironmentinteraction genomicselectionmodel.G3:Genes|Genomes|Genetics5, 569–582
54.Crossa,J.etal.(2016)ExtendingtheMarkerEnvironment interaction model for genomic-enabled prediction and genome-wideassociationanalysesindurumwheat.CropSci. 56,1–17
55.Cuevas, J. et al.(2016) Genomic prediction of genotype
-environment interaction kernel regression models. Plant
GenomePublishedonlineSeptember22,2016.http://dx.doi.
org/10.3835/plantgenome2016.03.0024
56.Cuevas,J.etal.(2017)Bayesiangenomicpredictionwith gen-otypeenvironment interaction kernel models. G3: Genes| Genomes|Genetics7,41–53
57.Montesinos-López, O.A.et al.(2015) Threshold modelsfor genome-enabledpredictionofordinalcategoricaltraitsinplant breeding.G3:Genes|Genomes|Genetics5,291–300
58.Montesinos-López,O.A.etal.(2015)Genomic-enabled predic-tionofordinaldatawithBayesianlogisticordinalregression.G3: Genes|Genomes|Genetics5,2113–2126
59.Montesinos-López,O.A.etal.(2015)Genomicpredictionmodels forcountdata.J.Agric.Biol.Environ.Stat.20,533–554
60.Montesinos-López,O.A.etal.(2016)GenomicBayesian predic-tionmodelforcountdatawithgenotypeenvironment interac-tion.G3:Genes|Genomes|Genetics6,1165–1177
61.Montesinos-López,O.A.etal.(2017)ABayesian poisson-log-normalmodelforcountdataformultiple-trait multiple-environ-ment genomic-enabled prediction. G3: Genes|Genomes| Genetics7,1595–1606
62.Gianola,D.andvanKaam,J.B.C.H.M.(2008)Reproducing ker-nelhilbertspaceregressionmethodsforgenomic-assisted pre-dictionofquantitativetraits.Genetics178,2289–2303
63.Philomin,J.etal.(2017)Comparisonofmodelsand whole-genomeprofilingapproachesforgenomic-enabledprediction ofSeptoriatriticiblotch,Stagonosporanodorumblotch,and tanspotresistanceinwheat.PlantGenome10,1–16
64.Philomin,J.etal.(2017)Genomicandpedigree-basedprediction forleaf,stem,andstriperustresistanceinwheat.Theor.Appl. Genet.130,1415–1430
65.Varshney,R.K.(2016)Excitingjourneyof10yearsfromgenomes tofieldsandmarkets: some successstories of genomics-assistedbreedinginchickpea,pigeonpeaandgroundnut.Plant Sci.242,98–107
66.Roorkiwal,M.etal.(2016)Genome-enabledpredictionmodels foryieldrelatedtraitsinchickpea.Front.PlantSci.7,1666
67.Sukumaran,S.etal.(2017)Genomicpredictionwithpedigree andgenotypeenvironmentinteractioninspringwheatgrownin SouthandWestAsia,NorthAfrica,andMexico.G3:Genes| Genomes|Genetics7, 481–197
68.PérezRodríguez,P.etal.(2015)Apedigreereactionnormmodel forpredictionofcotton(Gossypiumsp.)yieldinmulti-environment trials.CropSci.55,1143–1151
69.Crossa,J.etal.(2016)Genomicpredictionofgenebankwheat landraces.G3:Genes|Genomes|Genetics6,1819–1834
70.Govidan,V.etal.(2016)Genomicpredictionforgrainzincand ironconcentrationsinspringwheat.Theor.Appl.Genet.129, 1595–1605
71.Pérez-Rodríguez,P.etal.(2017)Single-stepgenomicand
pedi-greegenotypeenvironmentinteractionmodelsforpredicting
wheatlinesininternationalenvironments.PlantGenome
Pub-lished online May 5, 2017. http://dx.doi.org/10.3835/
plantgenome2016.09.0089
72.Vivek,B.S.etal.(2017)Useofgenomicestimatedbreeding valuesresultsinrapidgeneticgainsfordroughttolerancein maize.PlantGenome10,1–8
73.Zhang,X.etal.(2017)Rapidcyclinggenomicselectionina multiparentaltropicalmaizepopulation.G3:Genes|Genomes| Genetics7,1–12
74.Montesinos-López,O.A.etal.(2017)Predictinggrainyieldusing canopyhyperspectralreflectanceinwheatbreedingdata.Plant Methods13,4
75.Rutkoski,J.etal.(2016)Predictortraitsfromhigh-throughput phenotypingimproveaccuracyofpedigreeandgenomic selec-tionforgrainyieldinwheat.G36,2799–2808
76.Reynolds,M.andLangridge,P.(2016)Physiologicalbreeding. Curr.Opin.PlantBiol.31,162–171
77.Sun,J.etal.(2017)Multi-trait,randomregression,orsimple
repeatabilitymodelinhigh-throughputphenotypingdataimprove
genomicpredictionforgrainyieldinwheat.PlantGenome
Pub-lished online Mary 18, 2017. http://dx.doi.org/10.3835/
plantgenome2016.11.011
78.Montesinos-López,A.etal.(2017)GenomicBayesianfunctional regressionmodelswithinteractionsforpredictingwheatgrain yieldusinghyper-spectralimagedata.PlantMethods62,13
79.McCouch,S.etal.(2012)Genomicsofgenebank:acasestudy forrice.Am.J.Bot.99,407–423
80.Yu,X.etal.(2016)Genomicpredictioncontributingtoapromising globalstrategytoturbochargegenebanks.Nat.Plants2,16150
81.Gorjanc,G.etal.(2016)Initiatingmaizepre-breedingprograms usinggenomicselectiontoharnesspolygenicvariationfrom landracepopulations.BMCGenomics17,30