JOURNAL 1982, 535-541 0022-538X/82/0535-07$02.00/0
Nucleotide
Sequence
of the
5'
Noncoding
Region
and Part of
the gag
Gene of Rous Sarcoma Virus
RONALDSWANSTROM,* HAROLDE.VARMUS,ANDJ. MICHAEL BISHOP
Department of Microbiology and Immunology, UniversityofCalifornia,San Francisco, California 94143
Received3 August1981/Accepted28September1981
Several functions ofthe retrovirus genome involve structural features in the
vicinity of its 5' terminus. In anefforttofurtherelucidatetherelationship between
structure andfunction inretrovirusRNA, we havedetermined the sequence of the
first 1,010nucleotides at the 5' endofthe genomeof Roussarcomavirusbyusing
theMaxam-Gilbert method to sequence suitable domains in cloned Roussarcoma
virus DNA. The results (i) locate the initiation codon for the gag gene of Rous
sarcomavirus 372nucleotidesfrom the 5' end of viralRNA;(ii) demonstrate that
thiscodonispreceded bythree methionine codons thatareapparentlynotused in
translation; (iii) sustain previous conclusions that the principal site to which
ribosomes bind on the Roussarcomavirus genome in vitro doesnotcontain the
initiation codon for gag; (iv) permitdeduction of the amino acid sequence of a
viralstructuralprotein, p19; (v) confirmtheamino-terminal sequence ofPr76505;
and (vi) substantiate the identification ofa splice donor site described in the
accompanying manuscript (Hackettetal.,J. Virol., 41:527-534, 1982).
The genomesof retroviruses assume multiple
roles during the viral life cycle, servingas
tem-plate forthe synthesis of viralDNAbyreverse
transcriptase, messenger for the synthesis of
several viral gene products, precursor for the
genesis of subgenomic mRNA's, andvehicle for
thetransmission of viralgenesfromonehost cell
to another (1). Each of these roles involves
structuralfeatures locatedinthevicinity of the
5' terminus ofthe viral genome: the initiation
site for reverse transcription (41); a nucleotide
sequence(R)repeatedatthe 5'and3'terminiof
viralRNAthat isrequiredfor chainpropagation
byreversetranscriptase(3a);aribosomebinding
siteandAUG codonrequiredtoinitiate
transla-tion (29, 45); a splice donor site used in the
genesis of subgenomicmRNA's (5, 8, 9, 11,25,
28, 30, 38, 46); and a nucleotide sequence of
uncertain size that is apparently necessary for
incorporation of viralRNAintomaturing virions
(23, 34).Tofacilitate the full elucidation ofthese
functionally important features, wehave
deter-minedthesequenceof1,010nucleotidesatthe 5'
endof the genome of the subgroup A
Schmidt-RuppinstrainofRous sarcoma virus(RSV). Our
experimentalapproachexploitedtheavailability
of molecularly cloned DNA derived from the
genome of subgroup A Schmidt-Ruppin RSV
(7).The results locate the initiation codon for the
gag geneofRSV;demonstrate that thiscodon is
precededby threeAUGcodons that are
appar-entlynotusedintranslation, permitdeduction of
the amino acid sequence ofa viral structural
protein (p19),andidentifyanucleotidesequence
thatis likelytobe the splicedonor sitemapped
intheaccompanying manuscript (13).
MATERALSANDMETHODS
Allanalyses were performed on DNA derived from themolecular cloningofthe genome of SR-ARSV. The viral genome was first cloned into the phage vectorAgtWESXB (2, 21) asdescribedpreviously(7). TheSRA-2cloneof viralDNA was used in thisstudy
(7).Suitable restriction fragmentswerethen subcloned into theplasmidvector pBR322 by conventional pro-cedures(3). Two subclones were used for the present work(Fig. 1). One (pPvuDG)contained thePvuII-D
and-GfragmentsfromRSV DNA, joined at theSacI
site to regenerate their normal orientation compared with the Sacd permuted SRA-2 clone (R. Parker,
personal communication); the other plasmid used (pBam C) contained the 1.35-kilobase-pair BamHI-C
fragmentof SRA-2DNA,which extends from position
525ontheviralgenome (the RNA capping site isused asposition1) to a pointwell within gag (7). Viral DNA
inthe subclones was sequenced by the procedure of Maxam and Gilbert (24). For this purpose, we pre-pared restriction fragments with 5' overhangs that permit end labeling either by repair synthesis with reverse transcriptase (39) or by T4 polynucleotide
kinase(24).
RESULTS
Strategy for determining the nucleotide
se-quence ofviralDNA. The RNA genome of RSV
is transcribed into DNA during the early hours
ofinfection(42). The first stable product of viral
535
on November 10, 2019 by guest
http://jvi.asm.org/
DNAsynthesis is alinear d
sive with the viral genol
addition, terminal redund
threedomains: U3, encodi
viral RNA; R, a small re(
eachend ofthe viral RNA
the 5' end of the viral I
domains arearranged in th4
to give the long termina
linear duplexes are then
circularmolecules of two
ing onecopy of theLTR se
contain two copies (17, 1
previously reported the is
cloning of thecircular DN.
study we have used the
DNA which contains tw(
sequence (7, 39).
Twosubclones of viral I
used for sequence deter
pPvu DG, a clone of the
R U3 ,U5
Capf
(-)PBBstEII Sac
Pvu 11
EcoRI BamHI
Xho Hinfl Sau3AI Hpa11
A A
A
pPvuDG
FIG. 1. Strategy for sequ
molecular subclones pPvu I
mapped with restriction endc
age sites useful in sequenci
Maxamand Gilbert (24). Th4 325nucleotides,permittingtl tinuous sequence. The doma
represented inthesequenced topofthedrawing. The dom explained in thetext.The 5'e denoted by the label Cap,
tRNAT`P by (-)PB. The pc
approximated from the data manuscript; the location of previous reports (35,44) and Hunter (personal communic the direction of sequencing restriction sites.
Luplex that is coexten- mentsthatspans the fused 3' and 5' ends of the
me, but that has, in RSV genome and that includes ca. 250
nucleo-lancies composed of tides from the 3' end of the RSVgenome, both
edatthe 3' endofthe copiesof the LTR sequence, andca.750
nucleo-dundancy encoded at tides from the 5' end of the viral RNA (7, 39);
i;
and U5, encodedat and(ii) pBam C, which overlaps with theright-RNA (17, 32). These ward domain of pPvu DG and extends from
eorder 5'-U3-R-U5-3' position525ontheRSV genome to a pointwell
1 repeat (LTR). The within gag (7). Restriction mapping of these
formed into closed subclones revealed thecleavage sites illustrated
sorts-those contain- in Fig. 1 and led to the sequencing strategy
-quence and those that summarized there. To assure accuracy, each
[8, 32, 39). We have region of viral DNAwaseither sequenced
inde-olation and molecular pendentlyfrommorethanonerestriction siteor
As ofRSV (7). Inthis sequencedrepeatedly from the samesite.
SRA-2 clone of viral Nucleotide sequence of the DNA. Figure 2
D copies of the LTR illustrates the sequence of 1,010nucleotides of
viral DNA from the recombinant clones and
DNAin pBR322 were identifies thelocation of allrecognizable
restric-mination (Fig. 1): (i) tionsites. Figure 3 presents the same sequence
PvuII-D and -G frag- as the plus strand of viral RNA. The sequence as
illustratedinboth figures begins at the 5'
termi-nus of the subgroup A Schmidt-Ruppin RSV
genome and extends in the 3' direction. Because
p19 plO? the 3' and 5' ends of the viral genome are fused
in thecloned DNA (see above), it wasnecessary
to deduce the location of the 5' terminus of the
RNA. This was easily done by reference to
previous analyses of other strains of RSV (12, A
15, 19, 37,
40)
andby
our own studiesusing
anadaptation of the chain-terminator sequencing
technique of Sanger and colleagues (31, 40) to
determine the sequence of therunoff DNA
prod-uct synthesized from the 5' terminus of
sub-group A Schmidt-Ruppin RSV RNA (data not
shown).
Functional aspects of the sequence. Several
* - - * important landmarks could be located on the
sequence.(i)Theterminallyredundantsequence
known as R has been characterized previously
11111111] and occupies positions 1
through
21(3a).
(ii)
ALiizzm
:11 sequence of 18 nucleotides(positions
102pBamC
c~~
through
119)
isbase-paired
with the 3' stemregion of tRNATIP to provide a primer for
re-encing viral DNA. The verse
transcriptase
(4,10,
39);
as aconsequence,DO and pBam C were viralDNA synthesis initiates at position 101 on
Dng
by the technique of the template (15, 37). (iii) Previous work hase twoclones overlap by
identified
aribosome
binding site that includesheconstruction of a con- the AUG atposition41 and
neighboring
nucleo-ins of the viral genome tides, positions 9through 53 (6); paradoxically,
DNA are depicted at the this AUG apparently does not serve for the
nains U3, R, and US are initiation of translation (see below). (iv) The
-nd of theviral genomeis beginning of the gag gene was identified by
the site of binding for searching for an AUG followed by an extensive
rported
in9heapsresent
openreading
frame(Fig.
4) andby
reference toplO is based largely on the previously determined amino acid sequence
unpublished data of E. at the amino terminus of the gag gene product
-ation). Arrows indicate (27). A suitable AUG was found at position 372
employed at individual (Fig. 4)andwasfollowedbythepredictedamino
acid sequence (see
below).
(v) In theaccompa-*
on November 10, 2019 by guest
http://jvi.asm.org/
[image:2.490.57.250.313.511.2]SEQUENCE
Hph I HgiAI EcoRIl HaeIII Asu I Hinfl HgiAl
GCCATTTGACCATTCACCACATTGGTGTGCACCTGGGTTGATGGCCGGACCGTTGATTCCCTGACGACTACGAGCACCTG
I ~~~~IHpa11Ava 11
20 40 60 80
BstEII Hae III Mbo I
CATGAAGCAGAAGGCTTCATTTGGTGACCCCGACGTGATAGTTAGGGAATAGTGGTCGGCCACAGACGGCGTGGCGATCC
HphI III
100 120 140 160
Mnl I Taq I EcoRIl Mnl I Bbv I Fnu4HI Mnl I AluI
TGTCTCCATCCGTCTCGTCTATCGGGAGGCGAGTTCGATGACCCTGGTGGAGGGGGCTGCGGCTTAGGGAGGCAGAAGCT
180ltsu 200200 Fnu4HI Dde I
~~~~~~~~~~~~~~~~~~~240
Rsa I Mnl IHgiAI A/u I EcoRII Asu I Asu I Dde I Mbo11
GAGTACCGTCGGAGGGAGCTCCAGGGCCCGGAGCGACTGACCCCTGCCGAGAACTCAGAGGGTCGTCGOAAGACGGAGAG
Dde I Sac HaeIIIHpa1 I2 Mnl I 3
260 280 n/I320
EcoRIl Hae III Mbo I HphI
TGAGCCCGACGACCACCCCAGGCACGTCTTTGGTCGGCCTGCGGATCAAGCATGGAAGCCGTCATTAAGGTGA1TTCGTC
360 380 400
Tha I Dde I Asu I
CGCGTGTAAAACCTATTGCGGGAAAATCTCTCCTTCTAAGAAGGAAATAGGGGCCATGTTGTCCCTGTTACAAAAGGAAG
I ~~~~~~~~~~HaeIII
420 440 460 480
Mnl I Hpa11 Asu IEcoRlI Xho 11 Fnu4HIH/haI
GGTTGCTTATGTCTCCCTCAGATTTATATTCTCCGGGGTCCTGGGATCCCATCACTGCGGCGCTCTCCCAGCGGGCAATG
Dde I Ava 11 BamHIMbo I HaeI1 560
Rsa I EcoRII Fnu4HI Xho I Mnl I
GTACTTGGAAAATCGGGAGAGTTAAAAACCTGGGGATTGGTTTTGGGGGCATTGAAGGCGGCTCGAGAGGAACAGGTTAC
I ~~~~~~~~~~IAvaITaq II
580 600 620 640
Dde I Mnl IMnl I EcoRIIAsuIHpa11 TaqI A/uI
ATCTGAGCAAGCAAAGTTTTGGTTGGGATTAGGGGGAGGGAGGGTCTCTCCCCCAGGTCCGGAGTGCATCGAGAAACCAG
660 680 Ava1172
Bbv I Mbo 11 Hae 11 Hha I Mnl I CTACGGAGCGGCGAATCGACAAAGGGGAGGAGGTGGGAGAAACAACTGTGCAGCGAGATGCGAAGATGGcGCCAGAGGAA
Hinfl MnlI Fnu4HI AcyI 8
740 760 780 800
Fnu4HI Pvu 11 BbvI Hha I
GCGGCCACACCTAAAACCGTTGGCACATCCTGCTATCATTGCGGAACAGCTGTTGGCTGCAATTGCGCCACCGCCACAGC
Hae III Alu I Fnu4HI I8
820 840 880
Mnl I Hae IIIMnlI EcoRII Fnu4HI Asu IEcoRII
CTCGGCCCCTCCTCCCCCTTATGTGGGGAGTGGTTTGTATCCTTCCCTGGCGGGGGTGGGAGAGCAGCAGGGCCAGGGAG
920 940 Bbv I HaeIII 960
AvaI Fnu4HI MnlI EcoRil Tha I ATAACACGTCTCGGGGGCGGAGCAGCCAAGGGAGGAGCCAGGGCACGCGG
BbvI 1010
FIG. 2. Nucleotide sequenceof cloned viral DNA. Thesequenceisnumbered from the5' end of the viral genome and depicts the same polarity as the genome ("positive strand"). Identifiable sites of cleavage by restrictionendonucleaseswerelocated by computer-assisted search. Identifiable regions within thesequence, as
described in thetext,are asfollows:R, 1-21; US, 22-101; (-)PB, 102-119;startofgagandp19, 372; end of p19,
902.
nying manuscript (13), a splice donor site has
beenmappedtothevicinity of position 390;an excellentfacsimile of thecanonical splice donor
sequence (22) occupies positions 387 through
395 (see below; Fig. 5).
Although the sequence given here is more
thanampletoaccommodate theentirety of p19,
wecannotpresently identify the carboxy
termi-nus of the protein with certainty. Since p19 is
but one of five proteins encoded by gag in a
singlepolyprotein (with the probable order p19-plO-p27-p12-pI5;seereferences 35 and 44;
per-sonalcommunication from E. Hunter), the
car-boxy terminus of p19 is not demarcated by a
termination codon. On the basis ofincomplete datadescribingamino acidsequencesin p19and
other gag proteins, however, we suggest that p19mayterminate with amino acid residue 177
(tyrosine) in Fig. 3 (see below).
DISCUSSION
Authentication of the nucleotidesequence. We presenthere the sequenceof1,010 nucleotides,
beginningatthe5' terminus of theRSVgenome
andextending through the p19 domain of thegag gene. We have anumber ofreasons tobelieve that our identification and sequencing of the
viral DNA have notbeen affected by anylarge errors. First, the subclone pPvu DG includes
bothcopies of the LTRsequenceand displaysat
least one of the functional activities of this domain-the ability to direct the initiation of
Fnu4HI Taq I Mnl I
Asu I Mnl I
9VW
VOL.41, 1982
on November 10, 2019 by guest
http://jvi.asm.org/
[image:3.490.47.442.62.445.2]SWANSTROM, VARMUS, BISHOP
GCCAUUUGACCAUUCACCACAUUGGUGUGCACCUGGGUUGAUGGCCGGACCGUUGAUUCCCUGACGACUA
CGAGCACCUGCAUGAAGCAGAAGGCUUCAUUUGGUGACCCCGACGUGAUAGUUAGGGAAUAGUGUUCUGUCACAGACGUG GUGGCGAUCCUGUCUCCAUCCGUCUCGUCUAUCGGGAGGCGAGUUCGAUGACCCUGGUGGAGGGGGCUGCGGCUUAGGGA
GGCAGAAGCUGAGUACCGUCGGAGGGAGCUCCAGGGCCCGGAGCGACUGACCCCUGCCGAGAACUCAGAGGGUCGUCGGA
met glu ala AGACGGAGAGUGAGCCCGACGACCACCCCAGGCACGUCUUUGGUCGGCCUGCGGAUCAAGC AUG GAA GCC
10 20
val ile lys val ile ser ser ala cys Iys thr tyr cys gly lys ile ser pro ser lys GUC AUU AAG GUG AUU UCG UCC GCG UGU AAA ACC UAU UGC GGG AAA AUC UCU CCU UCU AAG
30 40
Iys glu ile gly ala met leu ser leu leu gin Iys glu gly leu leu met ser pro ser AAG GAA AUA GGG GCC AUG UUG UCC CUG UUA CAA AAG GAA GGG UUG CUU AUG UCU CCC UCA
50 60
asp leu tyr ser pro gly ser trp asp pro ile thr ala ala leu ser gin arg ala met
GAU UUA UAU UCU CCG GGG UCC UGG GAU CCC AUC ACU GCG GCG CUC UCC CAG CGG GCA AUG
70 80
val leu gly lys ser gly glu leu lys thr trp gly leu val leu gly ala leu lys ala GUA CUU GGA AAA UCG GGA GAG UUA AAA ACC UGG GGA UUG GUU UUG GGG GCA UUG AAG GCG
90 100
ala arg glu glu gin val thr ser glu gin ala Iys phe trp leu gly leu gly gly gly GCU CGA GAG GAA CAG GUU ACA UCU GAG CAA GCA AAG UUU UGG UUG GGA UUA GGG GGA GGG
110 120
arg val ser pro pro gly pro glu cys ile glu lys pro ala thr glu arg arg ile asp
AGG GUC UCU CCC CCA GGU CCG GAG UGC AUC GAG AAA CCA GCU ACG GAG CGG CGA AUC GAC
130 140
lys gly glu glu val gly glu thr thr val gin arg asp ala lys met ala pro glu glu
AAA GGG GAG GAG GUG GGA GAA ACA ACU GUG CAG CGA GAU GCG AAG AUG GCG CCA GAG GAA
150 160
ala ala thr pro lys thr val gly thr ser cys tyr his cys gly thr ala val gly cys
GCG GCC ACA CCU AAA ACC GUU GGC ACA UCC UGC UAU CAU UGC GGA ACA GCU GUU GGC UGC
170 180
asn cys ala thr ala thr ala ser ala pro pro pro pro tyr val gly ser gly leu tyr
AAU UGC GCC ACC GCC ACA GCC UCG GCC CCU CCU CCC CCU UAU GUG GGG AGU GGU UUG UAU
190 200
pro ser leu ala gly val gly glu gin gin gly gin gly asp asn thr ser arg glV arg
CCU UCC CUG GCG GGG GUG GGA GAG CAG CAG GGC CAG GGA GAU AAC ACG UCU CGG GGG CGG 70
150
230 310
380
440
500
560
620
680
740
800
860
920
980 210
ser ser gin gly arg ser gin gly thr arg
AGC AGC CAA GGG AGG AGC CAG GGC ACG CGG
FIG. 3. Proposedaminoacidsequence for thep19domain of RSVgag. The sequence presented in Fig. 2 is hererewrittenasviralRNA,beginningatthe 5'terminus of the RSV genome. Theunderliningdenotes the siteof
bindingfortRNAT1p.Anopenreadingframe that may encodep19wasidentified as described in the text and in thelegendtoFig.4and usedtodeduceaproposedamino acid sequence for thep19domain of gag.
RNA synthesis both in vitro (W. DeLorbe,
personal communication) and in vivo (7; P.
Luciw,personalcommunication). Second, both subclones have been mapped extensively with restriction endonucleases (7; Fig. 1). The se-quence illustrated in Fig. 2 contains all of the
sites predicted by these previous analyses. Third,ourpresentsequenceof theU5domain is
in agreement with previous results obtained from viralDNAsynthesizedinvitro(15, 37, 40). Fourth, thesequence correctly locates and rep-resentsthepreviouslycharacterized binding site for tRNATrP (4, 10). Fifth, the viral DNA was
derived fromareplication-competent virus, and
the clonedDNAisinfectious (7). Sixth, wecan
deduce fromoursequence thepreviously
deter-mined amino acid sequence from the amino
terminus of the gag gene (27); the sequence
containsasingleopenreading frame inthe gag
region.
Locating gag proteins on the nucleotide se-quence. The initial product oftranslation from
gag appears to be Pr76gag (29, 45), a
76,000-dalton protein whose amino-terminal sequence
has been determined to be:
met-glu-ala-val-ile-lys-val-x-x-ala-x-lys (27). A virtually identical
on November 10, 2019 by guest
http://jvi.asm.org/
NUCLEOTIDE SEQUENCE OF RSV 539
1 7 39 41
r. 54 o 62
4. 82
'H 83
S
o 105
"I 116
a 119 X 123 o' 130
0
U) 198 o 199
z 225
240
278
321
Reading FramE
2 3
OP 372
OP 386
AUG 391
OP 407
OP 437
AUG 448
OP 456
OP 489
OP 558
AM 583
AM 613
AM 644
AUG 670
OP 778
AM 786
OP 812
OP 901
OP 962
FIG. 4. Analysis ofreading frar
initiation andtermination codons we of the threereading frames; frame first nucleotide at the 5' terminus of 1 3), frame 2 beginswith the second,a
with the third. AUG, Methionine-; codon; AM, amber termination co ochreterminationcodon (UAA); OP codon (UGA). The AUG codon N
initiates thegaggene inframe 3 is
sequence is encoded by nucleot
to 410 in RSV RNA(Fig. 3), the
cy being the presence of thr
between valandala rather than t
unexplained discrepancy, we p
virtualidentity suffices to locate
the gag gene on thenucleotide
ThecleavageofPr76gaggives
proteins whose order within
precursor isprobably: p19-p10-l
44; personal communication fr
The carboxy terminus ofp19 is
by atermination codon, but we
todeduce theapproximatelocat
nus to be position 177 by usi
findings ofE. Hunter(personal c
(i) On the basis of carboxypept
p19ends intyrosine. Giventhen
ofp19,thetyrosine residueatpi
or 183 might represent the ca
(Fig. 3). (ii) The tyrosine at resi
to lie withinplO (E. Hunter, per
cation). (iii) Carboxypeptidase
fromp19 and thenfails toprog
the protein (E. Hunter, persoi
tion). The sequence before the
due 155 should be fully suscepi
sis. The proline residue tha
tyrosine at position 177 would
releaseof thetyrosine (14), mak
the likely carboxy terminus of
identification of the carboxy
tern
the exactlocalization of plO await further
analy-1 2 3 sis of amino acid sequence in the isolated
pro-AUG teins.
oc Ribosome
binding
and theinitiationoftransla-OP tion from
gag.
Translation from the mRNA's ofoc
eucaryotic
organisms
generally
initiatesonly
inOC the
vicinity
ofthe 5' terminusofthe RNA(M.
AUG Kozak, in A. Shatkin, ed., Current Topics in
AUG Microbiology and Immunology, in press).
Sher-AUG man and Stewart
(36)
and Kozak(20)
haveOP proposed that ribosomes inevitably bind at or
OP near the 5' end ofeucaryotic mRNA's and then
AUG "travel" to the first AUG downstream, where
AUG translation initiates. Previousfindingswith RSV
oc were in accord with these views: translation of
AUG the intact RSV genome initiates
only
at gag, thegenelocated closest to the 5' end of the genome
nes. All potential (29, 45); and a strong ribosome binding site has
-relocated ineach been identified that includes the first AUG
1begins with the downstream from the 5' end of RSV RNA (6).
thesequence
(F.ig
Paradoxically, however, the identified ribosomeotfenti beinitiation
binding
site doesnotrepresent
thesiteofinitia-pton
(UAG);oCn
tionfortranslationfromgag.Thisparadox
was',opaltermination apparent from a comparison of previous
analy-which we deduce sesof the nucleotide sequencein theU5domain
underlined. (15, 37) and the amino-terminal sequence of
Pr76rar
(27), and the paradox isfully manifest inourpresent data. Theinitiation codon for gag is
ideresidues 372 preceded by three AUG codons, eachofwhich
only discrepan- is followed shortly and in frame by one or more
ee amino acids termination codons (Fig.4). Ittherefore appears
wo.Despite this that the site to which ribosomesinitiallybind in
resume that the vitro at the 5' end of theRSV genomedoes not
thebeginning of of itself dictate theposition at whichtranslation
sequence. starts. These findings add another example to
risetofiveviral thegrowing list of eucaryotic mRNA's which do
the polyprotein not initiate translation at the first methionine
p27-p12-p15 (35, codon downstream from the 5' end ofthe RNA
om E. Hunter). (26; Kozak, in press).
notdemarcated Kozak has recently compiled the nucleotide
have been able sequences that adjoin initiation codons in
eu-Lionofthetermi- caryotic mRNA's and has found that adenosine
ing unpublished usually occurs in the third position upstream
Iommunication). from the AUG (83% ofavailable examples), and
idase treatment, guanosine occurs at the first position
down-nolecularweight stream from the AUG (63% of the available
osition155, 177, examples) (Kozak, inpress). The significance of
Lrboxy terminus
idue 183 appears rsonal
communi-cleavestyrosine
ress further into
nal
communica-tyrosineat
resi-tible to
hydroly-Lt precedes the
allow only the
ingthistyrosine
rp19. Definitive
minusofp19and
372 389
AUG GAA GCC GUC AUU AAG GUG AUU 5' end of gag
AAGUAG splice donor
CAG
GUA AGU consensus sequenceFIG. 5. Identification of a potential splice donor site inRSVRNA. Data described in theaccompanying manuscript (13) locate a splice donor site in the vicinity of residue 390 of the RSV genome. The sequences adjoining this position are here aligned withthe"consensus sequence" for splice donor sites, compiled from a large number of viral and cellular mRNA's(22).
VOL.41, 1982
on November 10, 2019 by guest
http://jvi.asm.org/
[image:5.490.46.241.61.238.2]these findings is not known, but the same
fea-tures do occur in the sequences adjoining the
initiation codon for gag of RSV (Fig. 3). One
possibledistinctionbetween theAUG codons in
the 5'noncoding region of RSV and the initiation
codon of gag is that theformer are preceded by
a U at position -3 from the AUG. Of the 56
cellular mRNA sequences reviewed by Kozak (in press), none has a pyrimidine at -3 from the
initiationcodon. Twoof these RNAs have AUG
codons in the 5' noncodingregion;in each case
theAUG ispreceded by a C at position -3 from
the AUG. The presenceof a purine or a
pyrimi-dine at position -3from an AUG codon may be
one of thesignals acellnormally uses to
distin-guish between correct and incorrect initiation sites. Further examples are needed to clarify this correlation.
Splicing in the genesis of RSV mRNA's. The subgenomic mRNA's of RSV are formed by
splicingashortnucleotide sequence from the 5'
endoftheviral genome toatleasttwopositions
within the genome (5, 20a, 25, 38, 46). In the
accompanying manuscript(13), wehave located
the "donor" sitefor splicing approximately 18
nucleotides downstream from the beginning of
thegag gene.Thispositioniscontained withina
nucleotidesequence thatdisplaysclose
similar-ity to the previously enunciated "consensus
sequence" for splice donor sites in eucaryotic
mRNA's (Fig. 5). According tothe consensus
sequence, the position of the splice would be
located between residues 389 and 390 of RSV
RNA (Fig. 5), a deduction which is in exact
accordwiththe resultsobtained by mappingthe
splice donor site with S1 nuclease(13) andwhich
adds credence to those results.
If we havecorrectly located thesplice donor
site in RSVRNA, theinitiation codon for gag is
spliced onto the subgenomic mRNA's of the
virus. Mightthis codon be used for initiation of
translation in its transposed positions? The
questioncannotbe answered withassurance as
yet because the splice acceptor sitesin theenv
and src subgenomic mRNA's have not been
located,although indirect evidence suggeststhat
theinitiationcodon for the srccoding regionlies
outside of the spliced leader sequence (13). In
contrast toRSV,thesplicedonor site for theenv
mRNAof mouse mammarytumor virusappears
tolieupstream from the startofgag;there are
no AUG codons in the leader sequence (J.
Majors, personalcommunication).
Mapping 5' noncoding regions of avian
retro-viruses.Thesequence analysisof the 5' terminus
of RSVhasprovidedapotentiallyusefulbattery
ofrestrictionenzyme sitesthatcouldbeapplied
to rapidly mapping other avian retrovirus
iso-lates.Inparticular,a BstEIIsite(position 106)is
present within the sequence of the tRNATrp
binding site [labeled (-)PB in Fig. 1], and a
BamHI site (position 525) and an XhoI site
(position 625) are present within the coding
region of p19 (Fig. 1 and 2). All three sites have
been documentedby nucleotide sequence
analy-sis in avianerythroblastosis virus (M.Privalsky,
personalcommunication), and they map to
anal-ogous positions in the endogenous viral locus,
evl(16). The BamHI and XhoI sites are present
inthe DNA of RSV Prague strain, RAV-O, and
MC29 viruses (32, 33, 43). These three
restric-tion enzyme sites willprobably be reliable
mark-ers for quickly identifying the position of the
tRNATrP binding site and mapping the start of
the gag sequences in otherisolates.
ACKNOWLEDGMENTS
We thank Bill DeLorbe and Richard Parker for making clonedRSV DNAs available and JohnMajors, Martin Pri-valsky, DennisSchwartz, and Eric Hunter for communicating results before publication. We also thank Bertha Cook forhelp in preparing the manuscript. The computer analysisof the sequence wasdone with theassistance of Peter Czernilofsky and Hugo Martinez.
R.S. was supportedbyPublic Health Service Training grant 1T32 CA 09043 from the National Institutes of Health. The research wassupported by Public Health Service grants CA 12705and CA 19287 from the National Institutes of Health and
by American Cancer Society grant MV48G to J.M.B. and H.E.V.
LITERATURECITED
1. Bishop, J. M.1978.Retroviruses. Annu. Rev. Biochem. 47:35-88.
2. Blattner, F. R., A. E. Blechl, K.Kenniston-Thompson, H. E.Faber, J. E.Richards, J. L. Sightom, P. W. Tucker, and 0. Smithies.1978.Cloninghuman fetalglobinand mouse a-typeglobin DNA: Preparation and screening of shotgun preparations. Science 202:1279-1284.
3.BoHvar,F., andK.Backman.1979. Plasmids of Escheri-chia coli asCloningvectors.MethodsEnzymol. 68:245-266.
3a. Coffin,J. M.1979.Structure,replication, and recombi-nation of retrovirusgenomes: someunifyinghypotheses. J. Gen. Virol. 42:1-26.
4. Cordell,B., E.Stavnezer, R. Friedrich,J.M.Bishop, and H. M. Goodman. 1976. The nucleotide sequence that binds primer forDNA synthesis to the avian sarcoma
virusgenome. J.Virol. 19:548-558.
5. Cordell,B., S. R. Weiss, H. E.Varmuw,and J. M. Bishop. 1978. At least 104nucleotidesaretransposed from the 5' terminus of theavian sarcoma virus genome to the 5' terminiofsmallerviral mRNAs. Cell15:79-91.
6. Darlix,J.-L., P.-F. Spahr, P. A.Bromley, and J.-C. Jaton. 1979. Invitro,the major ribosome binding site on Rous
sarcoma virus RNA does not contain the nucleotide sequencecoding for the N-terminal amino acids of the gag gene product. J. Virol. 29:597-611.
7. DeLorbe, W. J., P. A. Luciw, H. M. Goodman, H. E. Varmus, andJ.M.Bishop. 1980. Molecular cloning and characterization of avian sarcoma virus circular DNA molecules. J. Virol.36:50-61.
8. Donoghue, D. J.,E. Rothenberg, N.Hopkins, D. Balti-more, and P. A.Sharp. 1978. Heteroduplex analysis of the nonhomology region between Moloney MuLV and the dual host rangederivative HIX virus.Cell14:959-970. 9. Donoghue, D. J., P. A. Sharp, and R. A. Weinberg. 1979.
An MSV-specific subgenomic mRNA in MSV
trans-formed G8-124 cells.Cell17:53-64.
10. Eiden,J.J.,K.Quade, andJ.L.Nichols. 1976. Interaction
on November 10, 2019 by guest
http://jvi.asm.org/
NUCLEOTIDE SEQUENCE OF RSV 541
oftryptophantransfer RNAwith Rous sarcoma virus 35S RNA. Nature(London)259:245-247.
11. FaBer,D.V.,J. Rommelaere, and N. Hopkins. 1978. Large
Tioligonucleotidesof Moloney leukemia virus missing in anenvgenerecombinant,HIX,arepresent on an intracel-lular 21SMoloneyvirus RNA species. Proc. Natl. Acad. Sci.U.S.A.75:2964-2968.
12. Furuichi, Y., A. J. Shatkin, E. Stavnezer, andJ. M.
Bishop. 1975. Blocked, methylated 5' terminal sequence in avian sarcomavirusRNA. Nature(London) 257:618-620.
13. Hackett,P.H.,R.Swanstrom, H. E.Varmus, and J. M.
BIhop. 1982. The leader sequence of the subgenomic
mRNA's ofRous sarcoma virus is approximately 390
nucleotides. J.Virol.41:527-534.
14. Hartsuck, J.A., and W. N. Lipscome. 1971. Carboxypepti-dase A. In P. D.Boyer (ed.),Theenzymes, vol. 3, p. 1-56. AcademicPress, Inc.,New York.
15. Haseltine, W.A., A. M. Maxam, and W. Gilbert. 1977. Roussarcomavirus genome isterminally redundant: the 5'sequence. Proc. Natl. Acad. Sci. U.S.A.74:989-993. 16.Hlishinuma,F.,P.J. DeBona, S.Astrin, and A. M. Skalka.
1981.Nucleotidesequenceof acceptor site and termini of
integrated avian endogenous provirus evl: integration createsas6bprepeatof host DNA.Cell23:155-164. 17. Hsu,T.W., J.L.Sabran, G. E. Mark, R. V.Guntaka, and
J. M.Taylor.1978.Analysisofunintegrated avian RNA
tumorvirus double-strandedDNAintermediates.J. Virol.
28:810-818.
18.Ju, G., and A. M. Skalka. 1980. Nucleotide sequence
analysis of the long terminal repeat (LTR) of avian
retroviruses: structuralsimilaritieswithtransposable ele-ments.Cell22:376-386.
19. Keith,J.,and H.Fraenkel-Conrat. 1975. Identificationof the 5'end of Rous sarcomavirus RNA. Proc. Natl. Acad. Sci. U.S.A. 72:3347-3350.
20. Kozak, M. 1978. How do eucaryotic ribosomes select initiationregionsin messenger RNA? Cell15:1109-1123. 20a. Krzyzek,R.A.,M.S.Collett,A.F.Lau, M. L. Perdue, J.
P. Leis, and A.J. Faras.1978. Evidence for splicingof aviansarcomavirus5'-terminalgenomicsequences onto
viral-specific RNA in infected cells. Proc. Natl. Acad.
Sci. U.S.A.75:1284-1288.
21. Leder, P.,D.Tiemeler,andL.Enquist. 1977. EK2
deriva-tives ofbacteriophage lambda useful in the cloning of DNA fromhigher organisms: theXgtWES system. Sci-ence196:175-177.
22. Lerner,M.R., J.A.Boyle,S. M.Mount, S. L.Wolin,and
J. A. Steitz. 1980. Are snRNP's involvedin splicing?
Nature(London)283:220-224.
23. Linial, M., E. Medeiros,and W.S. Hayward. 1978. An avianoncovirusmutant(SE21Q1b) deficientingenomic
RNA: biological and biochemical characterization. Cell 15:1371-1381.
24. Maxam, A.,and W.Gilbert. 1980.Sequencingend-labeled DNA with base-specific chemical cleavages. Methods
Enzymol.65:499-560.
25. Mellon,P.,and P. H.Duesberg.1977.Subgenomic, cellu-lar Rous sarcomavirus RNAs contain oligonucleotides
from the 3' half and the 5' terminus of virion RNA. Nature
(London)270:631-634.
26. Mulligan, R. C., andP. Berg. 1981. Factorsgoverning expressionofabacterialgene in mammaliancells. Mol.
Cell. Biol.1:449-459.
27. Palmiter,R.D., J. Gagnon,V.M.Vogt, S.Ripley, and R.
N. Elsenman. 1978. The NH2-terminal sequence of the avian oncovirusgagprecursorpolyprotein(Pr769af).
Vi-rology91:423-433.
28. Panet,A., M.Gorecki, S. Bratosin, and Y.Aloni. 1978. Electron microscopic evidencefor splicing ofMoloney murine leukemia virus RNAs.NucleicAcids Res.
5:3219-3236.
29. Pawson, T.,G. S.Martin,andA. E.Smith.1976.Cell-free
translation of virionRNAfromnondefective and transfor-mation-defective Roussarcomaviruses. J.Virol.
19:950-957.
30. Rothenberg, E., D. J. Donoghue,andD. Baltimore.1978.
Analysisof a 5' leader sequenceonmurineleukemiavirus 21S RNA:Heteroduplexmappingwithlongreverse tran-scriptase products.Cell 13:435-451.
31. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA
sequencingwithchain-terminating inhibitors. Proc. Natl. Acad. Sci.U.S.A. 74:5463-5467.
32. Shank,P.R., S. H. Hughes, H.-J. Kung, J.E.Majors,N. Quintrell, R. V. Guntaka, J. M. Bishop, and H. E. Varmus.1978.Mapping unintegrated avian sarcoma virus DNA: terminiof linearDNAbear 300nucleotidespresent
once or twice in two species of circular DNA. Cell 15:1383-1395.
33. Shank, P. R., S. H. Hughes, and H. E. Varmus. 1981. Restrictionendonucleasemapping ofthe DNA of Rous-associatedvirus 0 reveals extensive homology in
struc-tureand sequencewith aviansarcomavirusDNA.
Virolo-gy 108:177-188.
34. Shank, P. R., and M. Linial. 1980. Avian oncornavirus
mutant(SE21Q1b) deficient ingenomic RNA: character-ization ofadeletion in theprovirus. J. Virol.36:450-456.
35. Shealy, D. J., A. G. Mosser, and R. R. Rueckert. 1980. Novelp19-related protein inRous-associated virus type 61:implications for aviangaggene order. J. Virol. 34:431-437.
36. Sherman, F., and J. W. Stewart. 1975. The useof iso-1-cytochrome c mutantsofyeastfor elucidatingthe
nucleo-tidesequences thatgoverninitiation oftranslation,p. 175-191. InG. Bernardiand F. Gros (ed.), Organization and
expressionof theeukaryotic genome:biochemical
mecha-nism of differentiation in prokaryotes and eukaryotes,
Proceedings ofthe 10thFEBS meeting. American Else-vier, New York.
37. Shine, J., A. P. Czernllofsky, R. Friedrich, H. M. Good-man, and J.M.Bishop.1977. Nucleotide sequenceatthe 5' terminus of the avian sarcoma virus genome. Proc.
Natl.Acad. Sci. U.S.A. 74:1473-1477.
38. Stoltzfus, C. M. and L. K. Kuhnert. 1979.Evidence for the
identityof shared 5'-terminal sequences between genome RNA and subgenomic mRNA's of B77 avian sarcoma
virus. J. Virol. 32:536-545.
39. Swanstrom, R., W. J. DeLorbe, J. M. Bishop, and H. E. Varmus. 1981. Sequencing of cloned unintegrated avian sarcoma virus DNA: Viral DNA contains direct and inverted repeats similar to those in transposableelements. Proc. Natl.Acad. Sci. U.S.A. 78:124-128.
40. Swanstrom, R., H. E. Varmus, and J.M. Bishop. 1981. The terminal redundancyoftheretrovirus genome facili-tates chain elongation by reverse transcriptase. J. Biol.
Chem. 256:1115-1121.
41. Taylor, J. M. 1977. An analysis of the role of tRNA
speciesasprimers forthetranscriptioninto DNA of RNA tumor virus genomes. Biochim. Biophys. Acta 473:57-71. 42. Varmus, H. E., S.Heasley, H.-J. Kung, H. Oppermann, V. C. Smith, J. M.Bishop,and P. R. Shank. 1978.Kinetics of synthesis, structure and purification of avian sarcoma virus-specific DNA made in the cytoplasm of acutely infectedcells.J. Mol. Biol. 120:55-82.
43. Vennstrom,B., C.Moscovici,H. M.Goodman, and J. M. Bishop. 1981. Molecular cloning of the avian myelocyto-matosis virus genome, and recovery of infectious virus by transfection of chicken cells. J. Virol. 39:625-631. 44. Vogt, V. M., R. Eisenman, and H. Dlggelmann. 1975.
Generationof avian myeloblastosisvirusstructural pro-teins by proteolytic cleavage of a precursor polypeptide. J. Mol. Biol.96:471-493.
45. vonderHelm,K., and P. H.Duesberg.1975. Translation of Rous sarcoma virus RNA in cell free systems from Ascites Krebs II cells. Proc. Natl. Acad. Sci. U.S.A. 72:614-618.
46.Weiss,S. R., H. E. Varmus, and J. M. Bishop. 1977. The size and genetic composition of virus-specific RNAs in the cytoplasm of cells producing avian sarcoma-leukosis vi-ruses. Cell 12:983-992.
VOL.41,1982
on November 10, 2019 by guest
http://jvi.asm.org/