eupa open proteomics 3 (2014)128–137
Available
online
at
www.sciencedirect.com
ScienceDirect
j o ur na l h o me p a g e:h t t p : / / w w w . e l s e v i e r . c o m / l o c a t e / e u p r o t
Little
things
make
big
things
happen:
A
summary
of
micropeptide
encoding
genes
Jeroen
Crappé
∗,
Wim
Van
Criekinge,
Gerben
Menschaert
∗LabofBioinformaticsandComputationalGenomics(BioBix),DepartmentofMathematicalModelling,StatisticsandBioinformatics, FacultyofBioscienceEngineering,GhentUniversity,9000Ghent,Belgium
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:Received31October2013 Receivedinrevisedform 6December2013
Accepted14February2014 Availableonline6March2014
Keywords: Micropeptide sORF Ribosomeprofiling Peptidomics (l)ncRNA Bioactivepeptide
a
b
s
t
r
a
c
t
Classicalbioactivepeptidesarecleavedfromlargerprecursorproteinsandaretargeted towardthesecretorypathwaybymeansofanN-terminalsignalingsequence.Incontrast, micropeptidesencodedfromsmallopenreadingframes,lacksuchsignalingsequenceand areimmediatelyreleasedinthecytoplasmaftertranslation.Overthepastfewyearsmany suchnon-canonicalgenes(includingopen readingframes, ORFssmaller than100AAs) havebeendiscoveredandfunctionally characterizedin differenteukaryoticorganisms. Furthermore,insilicoapproachesenabledthepredictionoftheexistenceofmanymore putativelycodingsmallORFsinthegenomesofSacharomycescerevisiae,Arabidopsisthaliana, DrosophilamelanogasterandMusmusculus.However,questionsremainastowhatthe func-tionalroleofthisnewclassofeukaryoticgenesmightbe,andhowwidespreadtheyare.In thefuture,approachesintegratinginsilico,conservation-basedpredictionanda combina-tionofgenomic,proteomicandfunctionalvalidationmethodswillprovetobeindispensable toanswertheseopenquestions.
©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomics Association(EuPA).
1.
Introduction
Itis awell-known fact that small peptides playimportant rolesinallkindsofbiologicalprocesses[1].Thelargestand mostextensivelystudiedclassofsmallpeptidescomprises classicalbioactivepeptides.Theseareenzymaticallycleaved fromlongerproteinprecursorscontaininganN-terminal sig-nalsequence,hencedirectingthetranslationproducttoward the secretorypathway (seeFig. 1). Oncereleased from the secretoryvesicles,mostofthesepeptides actasligandsof
∗ Correspondingauthorsat:LabofBioinformaticsandComputationalGenomics(BioBix),DepartmentofMathematicalModelling,Statistics andBioinformatics,FBE,GhentUniversity,CoupureLinks653,9000Ghent,Belgium.Tel.:+3292649922.
E-mailaddresses:[email protected](J.Crappé),[email protected],[email protected]
(G.Menschaert).
membrane receptors (mostly G protein-coupled receptors) andexerttheirextra-cellularbiologicalsignalingfunctionin a autocrine, paracrine or endocrine way [2]. Examples are neuropeptides, peptidehormonesand growth factors [3–6]. Othersecretedpeptidesexercisetheirfunctioninhostdefense systemshavingantimicrobialortoxicpropertiesorshow anti-hypertensive, antithromboticor antioxidative activity [7,8]. Recently,other (non-classical) peptides– encodedbysmall open reading frames (sORFs) – have been discovered, pre-sumablydefininganeweukaryoticgenefamily[9–16].These so-called micropeptides are translated from sORFs shorter
http://dx.doi.org/10.1016/j.euprot.2014.02.006
2212-9685 ©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomicsAssociation(EuPA).Open access under CC BY-NC-ND license. Open access under CC BY-NC-ND license.
Fig.1–Localizationofclassicalbioactivepeptidesandmicropeptides.ClassicalbioactivepeptidescontainanN-terminal signalsequencedirectingthetranslationproducttowardthesecretorypathway.Asaconsequence,thesebiologicallyactive peptidesexertanextra-cellularfunction.Incontrast,micropeptideslackanN-terminalsignalingsequence,andare consequentlyreleasedinthecytoplasmimmediatelyaftertranslation.
than 100AAs[14,17]. Sometimestheseare alsoreferred to aspolycistronicpeptidesinthecaseweretheyaretranslated fromapolycistronicmRNA[13]orasshortopenreadingframe (sORF)-encodedpolypeptides(SEPs)[18].Incontrasttoother bioactivepeptides,micropeptidesarenotcleavedfromalarger precursorproteinandlackanN-terminalsignalingsequence. Assuchtheyareinprinciplereleasedinthecytoplasm imme-diatelyaftertranslation.Thisreviewfocusesonthisnewclass ofpeptides(seeFig.1).
Inthepast,manymoleculeshavebeenoverlookedbecause ofvariousbiasesand/orsimplificationsintroducedinthe per-formeddiscovery strategy.Forexample,it wasonlyin1993 thatthefirstmicroRNA(lin-4)wasdiscoveredin Caenorhab-ditis elegans [19]. In the 2 subsequent decades more than 2500microRNAswereidentifiedinhumanalone [20]. Since micropeptidescameintothelimelight,evermoreresearchis conductedtothisnewtypeofbiomolecules,providing increas-ingevidencethatthis typeofbiomoleculesispossiblyalso longoverlooked[17].Itwasassumed,especiallyfor compre-hensivecDNAannotationstudies,thatprotein-codinggenes donotcodefortranslationproductsshorterthan100AAs[21]. Thisarbitrarilychosenminimumlengthreducesthe likeli-hoodoffalse-positivedetectionbygene-predictionsoftware and genome annotation algorithms, but at the same time vastly underestimates the true number of (atypical) small proteins[22,23].Thisgeneralizationisalsonoticeablein (man-uallycurated)proteindatabasessuchasSwissProt-KB,where atthetimeofwritingonly680(3.4%)outofatotalof20,271 reviewedhumanproteinshavealengthshorterthan100AAs. Althoughmicropeptideresearch isnotyet widespreadand muchremains to belearnedabout theirabundance, func-tionalactivityand localization,ahandful ofthesepeptides haverecentlybeenfunctionallyannotatedindifferent eukary-oticorganisms(seenextparagraphforanextensiveoverview or Table 1 fora brief summary). Though importantto an argumentofthegeneralconservationandfunctionacrossall kingdomsoflife,sORFsinbacteriaandviruses[24–28],willnot
becoveredinthisreview.Theabove-mentionedreferencescan serveasabriefoverviewofputativelycodingsORF(pcsORF) detectioninlowerorganisms.
2.
Overview
of
functionally
annotated
micropeptides
Thefirsteukaryoticmicropeptidewasonlydescribedin1996. Whileinvestigatingthefunctionofearlynodulin40(Enod40), formerlyannotatedasancRNAgeneinlegumes,vandeSande etal.transformedtobaccoplantswithasoybean GmENOD40-2 construct [29], proving that this construct was active in the non-legume tobacco, modulating the action of auxin. Sequence comparison of the tobacco and legume Enod40 clones revealed a highly conserved sORF coding for a 10 (tobacco)or 12(soybean)AAslong peptide[29].Lateron, a secondoverlappingcodingsORFof24AAwasidentifiedin soy-bean,categorizingEnod40asapolycistronicmRNA.Enod40isa well-knownfactorthatfunctionsinrootnodule organogene-sisinlegumesandalsodisplaysahighsequenceconservation amongotherplantspeciesincludingmonocots,suggestinga moregeneralbiologicalfunction[9].Inaddition,Enod40shows a highly conserved secondary topology, giving it the char-acteristicsofastructuralRNA[30].Thepresenceofpeptide encodingsORFs andofstructuredRNA,bothplaying arole indevelopmentalprocesses,indicatesthatEnod40actsasa bi-functionalordualRNA[31,32].Sincethediscoveryofthis first micropeptide in plants, others have been functionally annotated.InArabidopsis,thePOLARIS(PLS)gene,identified as apromoter traptransgenic linepredominantly showing expressionintheembryonicbasalregion,affectsrootgrowth andvasculardevelopment[10].Mutationanalysishasshown that the36 AAspeptideencodedbyPLS interactswithPIN proteins,forminganetworkthatplaysanimportantrolein thehormonalcrosstalkbetweenauxin,ethyleneandcytokinin
130
eupa open proteomics 3 (2014)128–137Table1–FunctionallyannotatedmicropeptidesinEukaryotes.
Genename sORFlengtha Speciesb Proposedfunction Conservation References Earlynodulin40(Enod40) 12,24 G.max rootnoduleorganogenesis Legumes&
Monocots
[9,29,30]
POLARIS(PLS) 36 A.thaliana hormonalcrosstalkduring
embryogenesis
[10,33,34]
Brick1(Brk1) 76 Z.mays morphogenesisofleafepithelia PlantsandAnimals
(HSPC300homolog)
[11,35,36,46]
Rotundifolia(ROT4) 53 A.thaliana leafshapemorphogenesis Plants [12,37]
Zm401p10 89 Z.mays tapetumdevelopment [38,39]
Zm908p11 97 Z.mays pollentubegrowth Poaceae [40]
tarsal-less(tal) 3×11,32 D.melanogaster legandactin-basedcell morphogenesisduring embryogenesis Arthropodsand Daphnia (>440MMyears) [13–15,43–45]
sarcolamban(scl) 28,29 D.melanogaster SERCa2+trafficking Vertebratesand
Arthropods (>550MMyears)
[16]
a Numberofaminoacidsinthedifferenttranslatedandfunctionalopenreadingframes. b Theorganisminwhichthefunctionhasbeenbeststudiedexperimentally.
toseveral morphologicaldefects of leafepithelia [11]. The geneishighlyconservedinplantsaswellasinanimalsand encodesa76AApeptidethatlacksany targetingsequence. ResearchinArabidopsishasshownthatBrk1isacritical WAVE-complexsubunitfunctioninginapathwaywiththeARP2/3 complex[35,36].ROTUNDIFOLIA(ROT4)wasidentified asan overexpressednovelsingleexongeneencodingasmall53AAs peptideinanArabidopsismutantwithshortleafsandfloral organs[37].PhylogeneticanalysisinArabidopsisindicatesthat ROT4definesanovelseed-plantspecificsmallpeptidegene family,comprising22ROTFOURLIKE(RTFL)genes,sharinga conserved29AAsregion[12].Morerecently,twonovelmaize sORFgenes,Zm401p10andZm908p11respectivelyencoding89 and97AApeptideswereidentified,playingarequiredrolein pollendevelopment[38–40].
Micropeptideresearchisnotlimitedtoplants;someofthe best-studiedsORFgeneshavebeenidentifiedintheanimal kingdom.InsilicopredictionanalysisofcDNAsinDrosophila melanogasteridentifiedseveralmRNA-likencRNAcandidates putativelyencodingORFs [41,42].Extensivestudyononeof thesecandidatesbyseveralgroupsrevealedthatthe evolu-tionaryconservedtarsal-less(tal)orpolishedrice(pri)inDrosphila andtheorthologous mille-pattes(mlpt)inTriboliumisinfact apolycistronicgeneencodingsmallpeptides[13–15].Thetal genecontainsatotalof4sORFsencodingfunctionally redun-dantpeptidesoflength11–32AAs,playingaroleinDrosophila embryogenesis.Theabsenceofafunctionaltalgene,eitherby knock-downorviamutation,leadstotheabsenceoftrichomes onthebodysurface,amissingtarsalregion,ectopicleg mor-phogenesisandabnormaldenticalbeltandtrachealformation
[13,14,43].Ontheotherhand,theoverexpressionoftal pep-tidesnegativelymodulatestheNotchsignalingpathway[44]. Extensive molecular analysis showed that tal controls epi-dermaldifferentiationbymodifyingthetranscriptionfactor Shavenbaby(Svb)fromatranscriptionalactivatortoarepressor, thus dominantlyinhibiting its downstreamfunction in tri-chomeformation[45].Similarly,RNAidepletedmlptembryos alter gap gene expression, generally leading to shortened embryoswithadditionalpairsoflegs,alsomissingsome pos-teriorabdominalsegments[15].
Recently,anothermemberofthissetofmRNAlike ncR-NAs(alsocomprisingthetalgene)ledtotheidentificationof twonewcoding sORFs(28and 29AAslong)inthe putative noncodingRNA003gene(pncr003:2L)[16,41].Phylogenetic con-servationanalysisindicatedthatthepncr003:2Lgeneshares a common sORF encoding ancestor gene with the human sarcolipin (sln)and its longer paralogue phospholamban(pln). Inordertoreflecttheirsimilaritytheresearcherssuggested torename thepncr003:3Lgeneand itsarthropodhomologs tosarcolamban(scl).VisualizingintracellularCa2+levelsinscl mutantsandwild-typecontrolsidentifiedaprimaryrolefor sclencodedpeptidesduringtheCa2+traffickingatthe sarco-endoplasmatic reticulum (SER) which isrequired for heart musclecontraction[16].
3.
Micropeptide-specific
characteristics
In contrasttoclassicalbioactivepeptides, oneofthe prop-ertiesmicropeptides shareisthe absenceofanN-terminal signaling sequence; as such they are not destined toward the secretory pathway, but are immediately released in the cytoplasm (see Fig. 1) [17]. While one would expect an intracellular, cytoplasmic function for peptides lacking such signal sequences, research indicates that this is not always thecaseand that someoftheidentified micropep-tides act non-cell-autonomously. Examining clonal sectors ofbrk1 mutantcellsinotherwise wild-typeleaves,showed thatbrk1mutantcellsindirectcontactwithwild-typecells appeared to be wild-type with normalstomata, indicating non-cell-autonomous functioning[46].Talisanother exam-ple of a non-celll-autonomous functioning micropeptide. Afterintroducing frame-shift mutationsinthe4ORFs ofa full-length tal transcript and expressing tal in a subset of epithelial cells, denticleformation was completelyrescued intal-expressing aswell asneighboringcells[13].Different mechanismsofsuchmicropeptide-regulatedmorphogenesis ofneighboringcellscan bepostulated.Theiractioncanbe eitherdirectlyasanextra-cellularsignalingmolecule,or indi-rectlyviaadownstreamtarget,orbyintracellularregulationof
Fig.2–AnalysisofDrosophilaSarcolambanhomology.Phylogenetictreeandmultipleproteinsequencealignmentof vertebrateandarthropodSarcolamban(Scl),Sarcolipin(Sln)andPhospholamban(Pln)peptides,constructedwithClustal OmegaandClustalW2-Phylogeny[102–104].
oneormoreintercellularsignalingpathways[13,17,47].More speculative,theirsmallmolecularsizemakesthemideal can-didatesforcellulartranslocationasgappeptides,tofunction asmembranepermeablepeptides,ortoleavethecellvia cell-derived(micro)vesiclessuchasexosomes[48–52].
Conservationofthecodingsequenceacrossverylarge evo-lutionarydistancesisanotherpeculiarfeatureof micropep-tides.Therecentlyidentifiedsarcolambanshowsconservation ofmorethan 550million years(see Fig.2).Analysisofthe relatedpeptidesindicatesaconservedpeptidesequenceand molecularstructurefrom fliestovertebratesall involvedin theregulationofCa2+traffic[16].Tarsal-lessisanother exam-pleofahighlyconservedmicropeptide.Talhomologscould beidentified in other insects and evenin Daphniapulex, a crustaceanspecies.Thisgenefamily isatleast440million yearsoldandshowsavaryingnumberofsORFswithan evo-lutionarytrend toward accumulationof moreORFs [14]. A systematicsearchfornewgenesinSaccharomycescerevisiae[53]
(seealsonextparagraph)ledtotheidentificationof,among others, smORF2. Functional homologsfor this gene with a temperature-sensitivephenotypecouldbeidentifiedinmany organismsfromyeasttohuman[54].Toourknowledge,Brk1 ishoweverbyfar themostconservedsORFencodinggene. Nexttobeing highlyconservedthroughoutthe plant king-dom,homologsforthemaizebrk1genehavebeenidentified inalmostall studiedanimaleukaryoticgenomes,including human,whereHSPC300(mammalianhomologofbrk1)also functionsintheScar/WAVEcomplex[11,36].Itremainstobe
confirmedthatconservationisatypicalcharacteristicof sORF-encodinggenes.Mostdiscoveredmicropeptidegenesresult fromlargegeneticorcomputationalscreenings,focusingon phylogeneticconservationasaproxytofunctionality,inthis waychoosinginterestingtargetsforfurther(elaborate) down-streaminvivoresearch.Thismightintroduceabiastoward discovery ofhighlyconservedsORFs.On theother hand,a highlyconservedstatecanalsopointtotherolemostofthese genesplayin(embryonic)development,morphogenesisand otherverybasicandimportantbiologicalprocesses.Assuch, the highsequenceand functionalconservationofthisnew typeofeukaryoticgeneproducts,mightexplainmanybasic butveryimportantfunctionssharedbyaplethoraofspecies overdifferentkingdoms.
4.
Systematic
searches
for
putatively
coding
sORFs
and
micropeptides
sORF encodinggeneshave,inouropinion,long been over-looked.However,thepastdecadehasseensomeimportant advancesinthe(genome-wide)identificationofpcsORFs.To identify those new and interesting candidates in the vast amountofrandomsORFsscatteredaloverthegenome,insilico strategies(oftenmakinguse ofexpressiondata)have been devised.S.cerevisiaewasthefirsteukaryoticspeciestobethe subjectofsuchasystematicandelaboratescan.Manyyeast sORFs were identified based on comprehensive sequence
132
eupa open proteomics 3 (2014)128–137database searching. Homology, comparative genomics and expressionfeaturesfrom serialanalysisofgeneexpression (SAGE), Northern blotting,RT-PCR and ORF tagging experi-mentswere takeninto account[54–61].Later,Kastenmayer etal.concludedthat,basedontheabove-mentioned indepen-dentexperimentalapproaches andcomputationalanalyses, atleast299pcsORFsarepresentintheS.cerevisiaegenome, manyofwhichhavepotentialorthologsinothereukaryotic species.Thisrepresentsasignificantpercentage(∼5%)ofthe amountofannotatedgenesinS.cerevisiae.Oftheseidentified pcsORFs,fourwereconfirmedtoproduceatranslationproduct and22seemtoregulategrowth[53].
Arabidopsiswasthefirstplantspeciesundergoinga thor-ough in silicoanalysisin thesearch ofnew sORF encoding genes.Becausecommongene-findingalgorithmshaveahard timeidentifyingsmall proteinproductsandare prone toa highnumberoffalsenegatives(asalreadymentionedinthe introduction)Hanadaetal.,developedtheCodingIndex(CI) measureforpcsORFpredictionbasedonthehexamer com-position bias, a general measure to distinguish CDS from non-CDS[62,63].ThisCImeasurewouldlaterformthebasis for a specific program package to identify sORFs, named sORFfinder[64].Afterperformingasixreadingframe trans-lationofintergenicsequencesofArabidopsisthaliana,pcsORFs wereonlyassignedasbeingcodingwhentheydemonstrated qualifyingCIvalues,abovebackgroundtilingarray hybridiza-tion intensities, evidence of purifying selection based on Ka/KsvaluesandoverlappingESTs.Usingthesecriteria,7159 pcsORFswithhighcodingpotential,ofwhich2376are sub-jecttopurifyingselection,wereidentifiedinA.thaliana[62,64]. Inarecentlypublishedfollow-up study,elaborating onthe functionofthesenewpcsORFs,anarraywasdesignedto gen-erateanexpressionatlasatseveraldevelopmentalstagesand undermultipleenvironmentalconditionsforthe7901 iden-tifiedpcsORFs [65]. 473pcsORFs showed ahigh number of homologsinotherplantspeciesandwereoverexpressed.49of thoseexpressedandsignificantlyconservedpcsORFsinduced variousmorphologicalchangesandvisiblephenotypiceffects
[65].
Arabidopsisisnottheonlyplantspeciessubjecttoan inte-grativeprocedure to identifypcsORFsatthe genome level. After obtaining ∼2.6 million expressed sequence tag (EST) readsfromaPopulusdeltoidesleaftranscriptome,full-length transcriptsfromtheESTsequencescouldbereconstructed. Using a computational approach based on coding poten-tial, evolutionary conservation and gene family clustering, andbyshowingevidenceofproteindomains,ncRNAmotifs, sequencelengthdistributionormass-spectrometrydata,at least56pcsORFencodinggenes(<200AAs)newtothe Popu-lusgenomeannotationcouldbeidentified[66].Veryrecently, work was published exploiting publicly available genome sequencesofPhaseolusvulgaris,Medicagotuncatula,Gycinemax andLotusjaponicusinasearchforpcsORFs(30–150AAs)[67]. Abioinformaticsanalysiswasperformedbasedonevidence ofexpression(transcription level), presence ofknown pro-teinregionsordomains(translationlevel)andidentification oforthologuesgenesintheexploredgenomes.Respectively 6170,10461,30521and23599pcsORFswereuncoveredwithin theP.vulgaris,G.max,M.truncatulaandL.japonicusgenomes. Based on specific EST expression analysis in P. vulgaris,
2336 of the identified pcsORFs showed evidence of gene expression.
In the animal kingdom, the first in silico and system-atic search fornewpcsORFswas carriedout forthemodel organismD.melanogaster[68]. Startingfromputatively non-coding euchromatic DNA, an initial set of 593 586 open readingframesbetween30and300basepairs(bps)longcould beidentified.UsingtBlastn,all pcsORFsshowingsignificant similarity withannotatedcodingsequences ortransposons were removed, at the same time only retaining pcsORFs showing significant amino acid sequence similarity with Drosophilapseudoobscura. Afterrealigningextendedversions oftheconservedpcsORFswithClustalW,anupperestimate of 4561 pcsORFs were identified in Drosophila. 72% of the inD.pseudoobscura conservedpcsORFs appearedtobetrue homologsasthey wereconservedin syntenicregionswith regard to the original D. melanogaster pcsORF. Only taking into accountsyntenicpcsORFs withfavorableKa/Ksvalues (having a ratio below 0.1), and with transcriptional evi-dence(basedoncombiningbothpubliclyavailableRNA-seq and tiling array data), the authors postulate that at least 401 functional sORFs exist in the D. melanogaster genome
[68].
Veryrecently,Crappéetal.combinedaninsilicoapproach and experimentalevidencebymeans ofribosomeprofiling data(seealsonextparagraph)foragenome-widesearchto detectnovelpcsORFsintheMusmusculusgenome[69].First, thegenomewasscannedforsORFswithhighcoding poten-tialusingthesORFfinderpackage.Secondly,acomprehensive feature matrix with peptide conservation measures, based on UCSC multiplespecies alignments,was constructed. In a thirdstep,the codingcapabilities ofthesepcsORFs were assessedbymeansofamachine-learningalgorithm. After-wards, the sORFs with a high coding score were verified forthepresenceofexperimentalribosomeprofilingsignals obtainedfrom mouse EmbryonicStem Cells(mESCs), hint-ing tosORF translation.Using thiscombined genome-wide approach dozens of both highly conserved and ribosome-targeted pcsORFs (possibly encoding micropeptides) were identified[69].
5.
sORF
identification
using
ribosome
profiling
Ribosome profiling is a recentlydescribed newstrategy to monitorproteinsynthesisbasedondeepsequencingof ribo-someprotected mRNAfragments[70–74]. By exploitingthe properties ofdrugsas harringtonine,puromycinor lactim-idomycin, stalling ribosomesat TranslationInitiation Sites (TIS),thestudyof(alternative)(a)TISwithsubcodonto single-nucleotide resolution is now possible [28,73,75–79]. In an attempt toprovidea genome-widemap ofprotein synthe-sis,Ingoliaetal.exploitedamachinelearningalgorithmon top oftheirribosomeprofiling datatosystematically delin-eate proteinproductsinmESCs [75]. Specialattention was paidtoarecentlyidentifiedandapparently abundantclass of RNAs, referred to as long non-coding RNAs (lncRNAs). TheselncRNAswerescannedfortranslatedregions,by defin-ingthemosthighlyribosome-occupied90nucleotidewindow
[80,81].Inthisway,Ingoliaetal.wereabletoidentifymany lncRNA regions, displaying high ribosomal occupancy and containingsmall open reading frames, classifying them as shortpolycistronicribosome-associatedcodingRNAs (sprcR-NAs)[75].Motivatedbythesefindingsandwhileperforminga globaltranslationinitiationanalysisusingribosomeprofiling (GTI-seq)inHEK293cells,Leeetal.alsospecifically charac-terizedthetranslationofncRNAs.Theywereabletoidentify 228ncRNAsassociatedwithGTI-seqsequencingreads,often overspanningevolutionaryconservedsORFs (medianlength of 54 nt) and frequently showing alternative initiation at non-AUGstartcodons[78].OurownresearchonmESC ribo-someprofilingdatastrengthenstheideathatsomelncRNAs actuallycontainputativelycodingsORFs.Whileinvestigating sORFswithinannotatedlncRNAregions, wewerealsoable todetectverywell-conservedandribosometargetedpcsORFs
[69].
Thequestionifandmorespecificallytowhatextent lncR-NAsactthrough theirtranslationalsORF productsremains upfordebateandisonethatwillnotfindananswerbased on ribosomeprofiling dataalone. For example, the mouse H19 lncRNA transcript functions as a true ncRNA [82,83], eventhoughdemonstratingribosomeoccupancy[75,84].This proves that simply ribosome profiling does not suffice as evidence of protein synthesis nor can be proposed as a fool-proofmethod todistinguishbetweencoding and non-coding transcripts [85,86]. In addition, one has to keep in mind thatspurious association ofribosomescould lead to translationalnoise[87].Thefactthatmostofthepredicted lncRNA transcriptsthat encode sORFs lack any significant conservation and that lncRNAs are rarely translated in humancell linesseemsconsistentwiththeseobservations
[80,81,88,89]. Inafollow upstudyGuttman etal.developed a metric, the ribosome release score (RRS), enabling sen-sitiveidentification offunctional protein-codingtranscripts based on the termination oftranslation atthe end ofthe ORF [90,91]. With this metric it is possible to discrimi-nate between protein-coding transcripts and other classes of non-coding transcripts, including lncRNAs. Because the classoflncRNAscloselyresembledtheribosomeoccupancy of other classes of non-coding transcripts with respect to this metric, it was deemed unlikely that lncRNAs, as a class, produce functional products [91]. Future measures will certainly be devisedto assess the true coding poten-tialofribosomeprofilingoccupiedmRNAonacasepercase basis.
Althoughmostresearchatthemomentpointstothetrue non-codingstateoflncRNAs,asubclasscouldstillcomprise pcsORFs,consideringthatsomeofthemhaveproventobe highlyconserved[69].Theabsenceofdetectablepeptide prod-uctsdoes furthermore notrule out their existence,as the expressionoflncRNAencodedsORFscould bevery specific intimeaswellasinspace[45,92].Anattractivehypothesis couldbethatlncRNAsare generallynon-coding, butunder specificcircumstances,enclosedsORFscanbetranslated (pre-sumablyatverylowlevels),thusrenderingtheselncRNAsas bifunctionalordualRNAs[31,32,93].Intheend,only scruti-nizedandfunctionalinvivoanalysiswillbeabletoproofifand whatlncRNAtranscriptsgiverisetofunctionalsmallprotein products[16,45,94].
6.
Detection
of
micropeptides
using
mass
spectrometry
Fewstudies existwheremassspectrometryisusedforthe direct detectionof micropeptides, althoughthis stillis the goldstandardwhenlookingforproteinorpeptideproducts. Usinganewlydevelopedstrategy,combiningpeptidomicsand massiveparallelRNA-seq,Slavoffet al.claimthediscovery of many previously uncharacterized human sORF-encoded polypeptides(SEPs)inK562(humanleukemia)cells[18].First, custom databaseswere constructed containingall possible polypeptidesbasedontheannotatedhumantranscriptome (RefSeq)and anexperimentalRNA-seqderivedK562 trans-criptome. Identifying the peptidomics mass spectrometry fragmentationspectra(MS/MS)usingthesecustom polypep-tidedatabasesandfourpreviouslyreportedSEPsaspositive controls,anextra86stilluncharacterizedSEPswere discov-ered, bringing the total ofunannotated humanSEPs to 90
[18,95].
Although this study is one of the early attempts to systematically identify micropeptides by means of mass spectrometry and subsequent peptide-to-spectrum match-ing strategy, they largely failed toprove the matureforms ofmicropeptides.Alternativestrategies,forwhichtheabove approachcouldserveasaguideline,shouldleadtothetrue identification of mature and native forms of endogenous micropeptidesasthisisstilloneofthemostimportantaspects ofpeptidomics.
7.
Outlook
Althoughtherearemanysophisticatedgeneprediction pro-grams available,themajority isoptimizedtopredict genes with 100 or more codons, rendering them inappropriate forsORF detection [96–98]. Development ofab initio single-sequence methods(basedoncodonpatterns)and discrimi-nativemetrics(pairwiseandmulti-speciesalignment-based comparativemetrics),suitedforthedetectionofsmallORFs, isstillinitsinfancy.Onshorterexons,comparativemetrics clearly outperform single-sequencebased methods, adding discriminatory power asadditionalspeciesare used.Using hybridmetrics,exploitingtherelativeindependenceoftheir inputmetrics,furtherincreasesperformance[94,98,99].Based on these findings, Crappéet al. combined several metrics, computedfromamulti-speciesalignmentandsubsequently builtaclassifiermodel(usingaSupportVectorMachine)to classifycodingversusnon-coding.Althoughthecombination ofdifferentmetricsgenerallyleadstobetterperformance,it isstilldependentonthecorrectnessofthemultiplesequence alignment,doesnotincorporatenear-cognatestartsitesand possiblymissesalotofhighlydivergentand/orquickly diverg-ingpcsORFs[69].Newandpromisingmetricswithregardto thisgrowingfieldofsORFdetectionwillcertainlyemerge.In this respect, PhyloCSFis certainlynoteworthy.Itisa com-parativegenomicsmethodthatanalyzesamultiplesequence alignmentusingphylogeneticcodonmodelstocorrectly dis-tinguish between protein-coding and non-coding regions. PhyloCSFclearlyoutperformsothermethodsfortheanalysis ofshortexons[100].
134
eupa open proteomics 3 (2014)128–137An integrated approach combining computational and experimentalvalidationstandsabetterchancetoresultin meaningfulfindingsthanmerelyperforminganinsilico pre-diction [94]. Slavoff et al. compiled a custom mRNA-seq derivedpolypeptidedatabasetoidentifymassspectrometry fragmentationspectraandwere abletoidentify86 unchar-acterizedSEPs[18].However,forreasonsalreadymentioned inthispaper,theribosomeprofilingtechniqueismore suit-ablethan mRNA-seq todelineate theexact ORFs and thus derive putative micropeptide sequences. Menschaert et al. prove that a ribosome profiling (RIBO-seq) derived custom databaseyieldsahighlyinformativesearchspace of trans-lationproductsforMS/MS-basedpeptideidentification[79]. AnautomatedpipelineconvertingRIBO-seqinformationinto a custom pcsORF sequence database, by delineating open reading framesfrom callingthe translation start sites and detectingSNP mutations,willprovetobeverybeneficial in futureMS-basedstudies.
Knownmicropeptideshavea verynarrow expressionin time as well as in space [45]. These characteristics are probablypartofthereasonwhytarsal-less,oneofthe best-studiedmicropeptidestodate,hasneverbeenidentifiedusing massspectrometry.Newandalternativeextractionmethods should prove tobe more effective at extracting cytoplasm boundmicropeptides[3].Forexample,Schwaidetal.recently reportedonanaffinity-basedapproachtoenrichand iden-tifycysteine-containinghumansORFencodedpolypeptides (ccSEPs)from cells.Using this approach theywere able to identify16novelccSEPs,derivedfromuncharacterizedsORFs
[101].Thedevelopmentofnewmassspectrometrybased tech-niques,suchasthereportedchemoproteomicapproach,will proveindispensableinordertoidentifyandcharacterizethe biologicalfunctionofmicropeptides.
Themereexistenceofapeptidedoesnotimplythatithas afunction.Evolutionaryconservationisdefinitelysuggestive forfunctionality,buttopinpointtheactualfunction, exper-imentaldemonstrationofabiologicaleffectisrequired[94]. Approachesusedtofunctionallydescribemicropeptidessuch astarsal-lessorsarcolambancanbeseenasageneralguidefor furtherinvivoanalyses[16,45].
8.
Conclusion
Researchonshortpeptides, encodedbysmallopenreading frames,isstillinitsinfancy.Nevertheless,growingevidence pointstotheexistenceoftheseso-calledmicropeptides,butto whatextentandhowimportantthisclassofnewbiomolecules is, still needs to be seen. Approaches integrating in silico, conservation-basedpredictionandacombinationofgenomic, proteomicandfunctionalvalidationmethodswillprovetobe indispensible tofurtherexplorethis micropeptide research field.
Acknowledgments
Thefinancialsupport oftheInstitute forthe Promotion of InnovationinFlanders(IWT)andtheBelgianNationalFund forScientificResearch(FWO-Flanders)isgratefully acknowl-edged. Dr. G. Menschaert is supported by a postdoctoral
fellowshipofFWO-Flanders,J.Crappéissupportedbya fel-lowshipofIWT-Flanders.
r
e
f
e
r
e
n
c
e
s
[1] FrickerLD.Neuropeptide-processingenzymes:applications fordrugdiscovery.AAPSJ2005;7:E449–55.
[2] BoonenK,CreemersJW,SchoofsL.Bioactivepeptides, networksandsystemsbiology.BioEssays2009;31:300–14.
[3] FrickerLD.Analysisofmousebrainpeptidesusingmass spectrometry-basedpeptidomics:implicationsfornovel functionsrangingfromnon-classicalneuropeptidesto microproteins.MolBioSyst2010;6:1355–65.
[4] NasselDR,WintherAM.Drosophilaneuropeptidesin regulationofphysiologyandbehavior.ProgNeurobiol 2010;92:42–104.
[5] HummonAB,AmareA,SweedlerJV.Discoveringnew invertebrateneuropeptidesusingmassspectrometry.Mass SpectromRev2006;25:77–98.
[6] BaggermanG,BoonenK,VerleyenP,DeLoofA,SchoofsL. PeptidomicanalysisofthelarvalDrosophilamelanogaster
centralnervoussystembytwo-dimensionalcapillaryliquid chromatographyquadrupoletime-of-flightmass
spectrometry.JMassSpectrom2005;40:250–60.
[7] KimSK,WijesekaraI.Developmentandbiologicalactivities ofmarine-derivedbioactiveperptides:areview.JFunct Foods2010;2:1–9.
[8] SarmadiBH,IsmailA.Antioxidativepeptidesfromfood proteins:areview.Peptides2010;31:1949–56.
[9] RohrigH,SchmidtJ,MiklashevichsE,SchellJ,JohnM. SoybeanENOD40encodestwopeptidesthatbindto sucrosesynthase.ProcNatlAcadSciUSA2002;99:1915–20.
[10] CassonSA,ChilleyPM,ToppingJF,EvansIM,SouterMA, LindseyK.ThePOLARISgeneofArabidopsisencodesa predictedpeptiderequiredforcorrectrootgrowthandleaf vascularpatterning.PlantCell2002;14:1705–21.
[11] FrankMJ,SmithLG.Asmall,novelproteinhighly conservedinplantsandanimalspromotesthepolarized growthanddivisionofmaizeleafepidermalcells.CurrBiol 2002;12:849–53.
[12] NaritaNN,MooreS,HoriguchiG,KuboM,DemuraT, FukudaH,etal.Overexpressionofanovelsmallpeptide ROTUNDIFOLIA4decreasescellproliferationandaltersleaf shapeinArabidopsisthaliana.PlantJ2004;38:699–713.
[13] KondoT,HashimotoY,KatoK,InagakiS,HayashiS, KageyamaY.Smallpeptideregulatorsofactin-basedcell morphogenesisencodedbyapolycistronicmRNA.NatCell Biol2007;9:660–5.
[14] GalindoMI,PueyoJI,FouixS,BishopSA,CousoJP.Peptides encodedbyshortORFscontroldevelopmentanddefinea neweukaryoticgenefamily.PLoSBiol2007;5:e106.
[15] SavardJ,Marques-SouzaH,ArandaM,TautzD.A segmentationgeneintriboliumproducesapolycistronic mRNAthatcodesformultipleconservedpeptides.Cell 2006;126:559–69.
[16] MagnyEG,PueyoJI,PearlFM,CespedesMA,NivenJE, BishopSA,etal.Conservedregulationofcardiaccalcium uptakebypeptidesencodedinsmallopenreadingframes. Science2013;341:1116–20.
[17] HashimotoY,KondoT,KageyamaY.Lilliputiansgetinto thelimelight:novelclassofsmallpeptidegenesin morphogenesis.DevGrowthDiffer2008;50(Suppl. 1):S269–76.
[18] SlavoffSA,MitchellAJ,SchwaidAG,CabiliMN,MaJ,Levin JZ,etal.Peptidomicdiscoveryofshortopenreading frame-encodedpeptidesinhumancells.NatChemBiol 2013;9:59–64.
[19] LeeRC,FeinbaumRL,AmbrosV,TheC.elegans heterochronicgenelin-4encodessmallRNAswith antisensecomplementaritytolin-14.Cell1993;75: 843–54.
[20] KozomaraA,Griffiths-JonesS.miRBase:integrating microRNAannotationanddeep-sequencingdata.Nucleic AcidsRes2011;39:D152–7.
[21] CarninciP,KasukawaT,KatayamaS,GoughJ,FrithMC, MaedaN,etal.Thetranscriptionallandscapeofthe mammaliangenome.Science2005;309:1559–63.
[22] DingerME,PangKC,MercerTR,MattickJS.Differentiating protein-codingandnoncodingRNA:challengesand ambiguities.PLoSComputBiol2008;4:1–5.
[23] FrithMC,ForrestAR,NourbakhshE,PangKC,KaiC,Kawai J,etal.Theabundanceofshortproteinsinthemammalian proteome.PLoSGenet2006;2:e52.
[24] HemmMR,PaulBJ,SchneiderTD,StorzG,RuddKE.Small membraneproteinsfoundbycomparativegenomicsand ribosomebindingsitemodels.MolMicrobiol
2008;70:1487–501.
[25] HemmMR,PaulBJ,Miranda-RíosJ,ZhangA,SoltanzadN, StorzG.SmallstressresponseproteinsinEscherichiacoli: proteinsmissedbyclassicalproteomicstudies.JBacteriol 2010;192:46–58.
[26] BoekhorstJ,WilsonG,SiezenRJ.Searchinginmicrobial genomesforencodedsmallproteins.MicrobBiotechnol 2011;4:308–13.
[27] HobbsEC,FontaineF,YinX,StorzG.Anexpandinguniverse ofsmallproteins.CurrOpinMicrobiol2011;14:167–73.
[28] Stern-GinossarN,WeisburdB,MichalskiA,LeVT,HeinMY, HuangSX,etal.Decodinghumancytomegalovirus.Science 2012;338:1088–93.
[29] vandeSandeK,PawlowskiK,CzajaI,WienekeU,SchellJ, SchmidtJ,etal.Modificationofphytohormoneresponseby apeptideencodedbyENOD40oflegumesanda
nonlegume.Science1996;273:370–3.
[30] GultyaevAP,RoussisA.Identificationofconserved secondarystructuresandexpansionsegmentsinenod40 RNAsrevealsnewenod40homologuesinplants.Nucleic AcidsRes2007;35:3144–52.
[31] UlvelingD,FrancastelC,HubeF.Whenoneisbetterthan two:RNAwithdualfunctions.Biochimie2011;93:633–44.
[32] BardouF,MerchanF,ArielF,CrespiM.DualRNAsinplants. Biochimie2011;93:1950–4.
[33] ChilleyPM,CassonSA,TarkowskiP,HawkinsN,WangKL, HusseyPJ,etal.ThePOLARISpeptideofArabidopsis regulatesauxintransportandrootgrowthviaeffectson ethylenesignaling.PlantCell2006;18:3058–72.
[34] LiuJ,MehdiS,ToppingJ,FrimlJ,LindseyK.Interactionof PLSandPINandhormonalcrosstalkinArabidopsisroot development.FrontPlantSci2013;4:75.
[35] LeJ,MalleryEL,ZhangC,BrankleS,SzymanskiDB. ArabidopsisBRICK1/HSPC300isanessential
WAVE-complexsubunitthatselectivelystabilizesthe Arp2/3activatorSCAR2.CurrBiol2006;16:895–901.
[36] DjakovicS,DyachokJ,BurkeM,FrankMJ,SmithLG. BRICK1/HSPC300functionswithSCARandtheARP2/3 complextoregulateepidermalcellshapeinArabidopsis. Development2006;133:1091–100.
[37] IkeuchiM,YamaguchiT,KazamaT,ItoT,HoriguchiG, TsukayaH.ROTUNDIFOLIA4regulatescellproliferation alongthebodyaxisinArabidopsisshoot.PlantCellPhysiol 2011;52:59–69.
[38] MaJ,YanB,QuY,QinF,YangY,HaoX,etal.Zm401,a short-openreading-framemRNAornoncodingRNA,is essentialfortapetumandmicrosporedevelopmentand canregulatethefloretformationinmaize.JCellBiochem 2008;105:136–46.
[39] WangaDX,LiaCX,ZhaoQ,ZhaoLN,WangMZ,ZhuDY, etal.Zm401p10,encodedbyananther-specificgenewith shortopenreadingframes,isessentialfortapetum degenerationandantherdevelopmentinmaize.Funct PlantBiol2009;36:73–85.
[40] DongX,WangD,LiuP,LiC,ZhaoQ,ZhuD,etal.Zm908p11, encodedbyashortopenreadingframe(sORF)gene, functionsinpollentubegrowthasaprofilinligandin maize.JExpBot2013;64:2359–72.
[41] TupyJL,BaileyAM,DaileyG,Evans-HolmM,SiebelCW, MisraS,etal.Identificationofputativenoncoding polyadenylatedtranscriptsinDrosophilamelanogaster.Proc NatlAcadSciUSA2005;102:5495–500.
[42] InagakiS,NumataK,KondoT,TomitaM,YasudaK,Kanai A,etal.Identificationandexpressionanalysisofputative mRNA-likenon-codingRNAinDrosophila.GenesCells 2005;10:1163–73.
[43] PueyoJI,CousoJP.The11-aminoacidlongTarsal-less peptidestriggeracellsignalinDrosophilalegdevelopment. DevBiol2008;324:192–201.
[44] PiH,HuangYC,ChenIC,LinCD,YehHF,PaiLM. Identificationof11-aminoacidpeptidesthatdisrupt Notch-mediatedprocessesinDrosophila.JBiomedSci 2011;18:42.
[45] KondoT,PlazaS,ZanetJ,BenrabahE,ValentiP,Hashimoto Y,etal.Smallpeptidesswitchthetranscriptionalactivityof ShavenbabyduringDrosophilaembryogenesis.Science 2010;329:336–9.
[46] FrankMJ,CartwrightHN,SmithLG.ThreeBrickgeneshave distinctfunctionsinacommonpathwaypromoting polarizedcelldivisionandcellmorphogenesisinthemaize leafepidermis.Development2003;130:753–62.
[47] GhabrialA.CodingRNAs:separatingthegrainfromthe chaff.NatCellBiol2007;9:617–9.
[48] TourE,McGinnisW.Gappeptides:anewwaytocontrol embryonicpatterning?Cell2006;126(3):448–9.
[49] JoliotA,ProchiantzA.Transductionpeptides:from technologytophysiology.NatCellBiol2004;6:189–96.
[50] NeijssenJ,HerbertsC,DrijfhoutJW,ReitsE,JanssenL, NeefjesJ.Cross-presentationbyintercellularpeptide transferthroughgapjunctions.Nature2005;434:83–8.
[51] LudwigA-K,GiebelB,Exosomes:.smallvesicles
participatinginintercellularcommunication.IntJBiochem CellBiol2012;44:11–5.
[52] ChuaCEL,LimYS,LeeMG,TangBL.Non-classical membranetraffickingprocessesgalore.JCellPhysiol 2012;227:3722–30.
[53] KastenmayerJP,NiL,ChuA,KitchenLE,AuWC,YangH, etal.Functionalgenomicsofgeneswithsmallopen readingframes(sORFs)inS.cerevisiae.GenomeRes 2006;16:365–73.
[54] KesslerMM,ZengQ,HoganS,CookR,MoralesAJ,Cottarel G.SystematicdiscoveryofnewgenesintheSaccharomyces cerevisiaegenome.GenomeRes2003;13:264–71.
[55] VelculescuVE,ZhangL,VogelsteinB.Serialanalysisof geneexpression.Science-AAAS-Weekly1995.
[56] BasraiMA,HieterP,BoekeJD.Smallopenreadingframes: beautifulneedlesinthehaystack.GenomeRes
1997;7:768–71.
[57] OlivasWM,MuhlradD,ParkerR.Analysisoftheyeast genome:identificationofnewnon-codingandsmall ORF-containingRNAs.NucleicAcidsRes1997;25:4619–25.
[58] BlandinG,DurrensP,TekaiaF,AigleM,Bolotin-Fukuhara M,BonE,etal.Genomicexplorationofthe
hemiascomycetousyeasts:4.ThegenomeofSaccharomyces cerevisiaerevisited.FEBSLett2000;487:31–6.
[59] BrachatS,DietrichFS,VoegeliS,ZhangZ,StuartL,LerchA, etal.ReinvestigationoftheSaccharomycescerevisiae
136
eupa open proteomics 3 (2014)128–137genomeannotationbycomparisontothegenomeofa relatedfungus:Ashbyagossypii.GenomeBiol2003;4:R45.
[60] CliftenP,SudarsanamP,DesikanA,FultonL,FultonB, MajorsJ,etal.FindingfunctionalfeaturesinSaccharomyces
genomesbyphylogeneticfootprinting.Science 2003;301:71–6.
[61] KellisM,PattersonN,EndrizziM,BirrenB,LanderES. Sequencingandcomparisonofyeastspeciestoidentify genesandregulatoryelements.Nature2003;423:241–54.
[62] HanadaK,ZhangX,BorevitzJO,LiWH,ShiuSH.Alarge numberofnovelcodingsmallopenreadingframesinthe intergenicregionsoftheArabidopsisthalianagenomeare transcribedand/orunderpurifyingselection.GenomeRes 2007;17:632–40.
[63] FickettJW,TungC-S.Assessmentofproteincoding measures.NucleicAcidsRes1992;20:6441–50.
[64] HanadaK,AkiyamaK,SakuraiT,ToyodaT,ShinozakiK, ShiuSH.sORFfinder:aprogrampackagetoidentifysmall openreadingframeswithhighcodingpotential.
Bioinformatics2010;26:399–400.
[65] HanadaK,Higuchi-TakeuchiM,OkamotoM,YoshizumiT, ShimizuM,NakaminamiK,etal.Smallopenreading framesassociatedwithmorphogenesisarehiddeninplant genomes.ProcNatlAcadSciUSA2013;110:2395–400.
[66] YangXH,TschaplinskiTJ,HurstGB,JawdyS,AbrahamPE, LankfordPK,etal.Discoveryandannotationofsmall proteinsusinggenomics,proteomics,andcomputational approaches.GenomeRes2011;21:634–41.
[67] GuillenG,Diaz-CaminoC,Loyola-TorresCA,Aparicio-Fabre R,Hernandez-LopezA,Diaz-SanchezM,etal.Detailed analysisofputativegenesencodingsmallproteinsin legumegenomes.FrontPlantSci2013;4:208.
[68] LadoukakisE,PereiraV,MagnyE,Eyre-WalkerA,CousoJP. Hundredsofputativelyfunctionalsmallopenreading framesinDrosophila.GenomeBiol2011;12:R118.
[69] CrappeJ,VanCriekingeW,TrooskensG,HayakawaE, LuytenW,BaggermanG,etal.Combininginsilico
predictionandribosomeprofilinginagenome-widesearch fornovelputativelycodingsORFs.BMCGenomics
2013;14:648.
[70] IngoliaNT,GhaemmaghamiS,NewmanJR,WeissmanJS. Genome-wideanalysisinvivooftranslationwith nucleotideresolutionusingribosomeprofiling.Science 2009;324:218–23.
[71] IngoliaNT.Genome-widetranslationalprofilingby ribosomefootprinting.MethodsEnzymol2010;470:119–42.
[72] GuoH,IngoliaNT,WeissmanJS,BartelDP.Mammalian microRNAspredominantlyacttodecreasetargetmRNA levels.Nature2010;466:835–40.
[73] IngoliaNT,BrarGA,RouskinS,McGeachyAM,WeissmanJS. Theribosomeprofilingstrategyformonitoringtranslation invivobydeepsequencingofribosome-protectedmRNA fragments.NatProtoc2012;7:1534–50.
[74] IngoliaNT,BrarGA,RouskinS,McGeachyAM,Weissman JS.Genome-wideannotationandquantitationof translationbyribosomeprofiling.CurrProtocMolBiol 2013;4.18:1–19.Chapter4:Unit418.
[75] IngoliaNT,LareauLF,WeissmanJS.Ribosomeprofilingof mouseembryonicstemcellsrevealsthecomplexityand dynamicsofMammalianproteomes.Cell2011;147:789–802.
[76] BrarGA,YassourM,FriedmanN,RegevA,IngoliaNT, WeissmanJS.High-resolutionviewoftheyeastmeiotic programrevealedbyribosomeprofiling.Science 2012;335:552–7.
[77] FritschC,HerrmannA,NothnagelM,SzafranskiK,HuseK, SchumannF,etal.Genome-widesearchfornovelhuman uORFsandN-terminalproteinextensionsusingribosomal footprinting.GenomeRes2012.
[78] LeeS,LiuB,HuangSX,ShenB,QianSB.Globalmappingof translationinitiationsitesinmammaliancellsat
single-nucleotideresolution.ProcNatlAcadSciUSA2012.
[79] MenschaertG,VanCriekingeW,NotelaersT,KochA, CrappeJ,GevaertK,etal.Deepproteomecoveragebasedon ribosomeprofilingaidsmassspectrometry-basedprotein andpeptidediscoveryandprovidesevidenceofalternative translationproductsandnear-cognatetranslation initiationevents.MolCellProteomics2013;12:1780–90.
[80] GuttmanM,AmitI,GarberM,FrenchC,LinMF,FeldserD, etal.Chromatinsignaturerevealsoverathousandhighly conservedlargenon-codingRNAsinmammals.Nature 2009;458:223–7.
[81] GuttmanM,GarberM,LevinJZ,DonagheyJ,RobinsonJ, AdiconisX,etal.Abinitioreconstructionofcell type-specifictranscriptomesinmouserevealsthe conservedmulti-exonicstructureoflincRNAs(vol.28,pp. 503,2010).NatBiotechnol2010;28:756–66.
[82] BrannanCI,DeesEC,IngramRS.TheproductoftheH19 genemayfunctionasanRNA.MolCellBiol
1990;10(1):28–36.
[83] CaiXZ,CullenBR.TheimprintedH19noncodingRNAisa primarymicroRNAprecursor.RNA2007;13:313–6.
[84] LiYM,FranklinG,CuiHM,SvenssonK,HeXB,AdamG, etal.TheH19transcriptisassociatedwithpolysomesand mayregulateIGF2expressionintrans.JBiolChem 1998;273:28247–52.
[85] GuttmanM,RinnJL.Modularregulatoryprinciplesoflarge non-codingRNAs.Nature2012;482:339–46.
[86] VoldersP-J,HelsensK,WangX,MentenB,MartensL, GevaertK,etal.LNCipedia:adatabaseforannotated humanlncRNAtranscriptsequencesandstructures. NucleicAcidsRes2013;41:D246–51.
[87] StruhlK.Transcriptionalnoiseandthefidelityofinitiation byRNApolymeraseII.NatStructMolBiol2007;14:103–5.
[88] BanfaiB,JiaH,KhatunJ,WoodE,RiskB,GundlingWEJ, etal.LongnoncodingRNAsarerarelytranslatedintwo humancelllines.GenomeRes2012;22:1646–57.
[89] DerrienT,JohnsonR,BussottiG,TanzerA,DjebaliS, TilgnerH,etal.TheGENCODEv7catalogofhumanlong noncodingRNAs:analysisoftheirgenestructure, evolution,andexpression.GenomeRes2012;22: 1775–89.
[90] JacksonRJ,HellenCUT,PestovaTV.Themechanismof eukaryotictranslationinitiationandprinciplesofits regulation.NatRevMolCellBiol2010;11:113–27.
[91] GuttmanM,RussellP,IngoliaNT,WeissmanJS,LanderES. Ribosomeprofilingprovidesevidencethatlargenoncoding RNAsdonotencodeproteins.Cell2013;154:240–51.
[92] BernsteinBE,BirneyE,DunhamI,GreenED,GunterC, SnyderM.AnintegratedencyclopediaofDNAelementsin thehumangenome.Nature2012;489:57–74.
[93] DingerME,GascoigneDK,MattickJS.Theevolutionof RNAswithmultiplefunctions.Biochimie2011;93:2013–8.
[94] KageyamaY,KondoT,HashimotoY.Codingvs.non-coding: translatabilityofshortORFsfoundinputativenon-coding transcripts.Biochimie2011;93:1981–6.
[95] OyamaM,Kozuka-HataH,SuzukiY,SembaK,Yamamoto T,SuganoS.Diversityoftranslationstartsitesmaydefine increasedcomplexityofthehumanshortORFeome.Mol CellProteomics2007;6:1000–6.
[96] DoJH,ChoiD.Computationalapproachestogene prediction.JMicrobiol2006;44(2):137–44.
[97] SleatorRD.Anoverviewofthecurrentstatusofeukaryote genepredictionstrategies.Gene2010;461:1–4.
[98] ChengH,ChanWS,LiZ,WangD,LiuS,ZhouY.Smallopen readingframes:currentpredictiontechniquesandfuture prospect.CurrProteinPeptSci2011;12:503–7.
[99] LinMF,DeorasAN,RasmussenMD,KellisM.Performance andscalabilityofdiscriminativemetricsforcomparative geneidentificationin12Drosophilagenomes.PLoSComput Biol2008;4:e1000067.
[100] LinMF,JungreisI,KellisM,PhyloCSF:.acomparative genomicsmethodtodistinguishproteincodingand non-codingregions.Bioinformatics2011;27:i275–82.
[101] SchwaidAG,ShannonDA,MaJ,SlavoffSA,LevinJZ, WeerapanaE,etal.Chemoproteomicdiscoveryof cysteine-containinghumansORFs.JAmChemSoc 2013;135(45):16750–3.
[102] LarkinMA,BlackshieldsG,BrownNP,ChennaR,
McGettiganPA,McWilliamH,etal.ClustalWandClustalX version2.0.Bioinformatics2007;23:2947–8.
[103] GoujonM,McWilliamH,LiW,ValentinF,SquizzatoS, PaernJ,etal.Anewbioinformaticsanalysistools frameworkatEMBL-EBI.NucleicAcidsRes2010;38: W695–9.
[104] SieversF,WilmA,DineenD,GibsonTJ,KarplusK,LiW, etal.Fast,scalablegenerationofhigh-qualityprotein multiplesequencealignmentsusingClustalOmega.Mol SystBiol2011;7:539.