Vol. 78, No. 1, pp. 124-128, January 1981 Biochemistry
Nucleotide sequence of cloned
unintegrated avian sarcoma virus
DNA: Viral DNA contains direct
and inverted repeats similar to
those in transposable elements
(retrovirus DNA/circular DNA/terminal repeat/controlregions/primers)
RONALDSWANSTROM, WILLIAM J. DELORBE, J. MICHAEL BISHOP,AND HAROLD E.VARMUS
Departmentof Microbiology and Immunology,UniversityofCalifornia,SanFrancisco,California94143 ContributedbyJ.MichaelBishop,September16, 1980
ABSTRACT Wehave determined the nucleotide sequence of portionsof twocircularavian sarcomavirus(ASV)DNAmolecules clonedin aprokaryotichost-vector system. Theregionwhose se-quence wasdetermined represents the circle junction site-i.e., the site atwhich the ends oftheunintegrated linearDNAarefused
toform circularDNA.Thesequencefromoneclonedmolecule, SRA-2, shows that the circlejunction site isthecenterofa
330-base-pair (bp) tandem directrepeat,presumablyrepresentingthe fusion of thelong terminalrepeat(LTR)unitsknown tobepresent attheends of the linearDNA.The circlejunction siteisalso the
centerofa15-bp imperfect invertedrepeat,which thusappearsat
theboundaries of theLTR.ThestructureofASVDNA-unique
codingregionflanked byadirectrepeat thatis, in turn,
termi-natedwith ashort invertedrepeat-is verysimilar to the structure
ofcertaintransposable elements. Several featuresof thesequence
implythat circularization to form the SRA-2 molecule occurred
withoutloss of information from the linear DNA precursor.
Cir-cularization of anotherclonedviral DNAmolecule,SRA-1, prob-ably occurred byadifferent mechanism.The circle junctionsiteof the SRA-1 molecule hasa63-bpdeletion,which may have arisen
byamechanism thatisanalogoustothe integrationofviralDNA intothe host genome.Flankingoneside of the tandem direct re-peat is the binding site for tRNAT"P, the previously described
primerfor synthesis of the first strand ofviralDNA. The other sideofthe directrepeatisflankedbyapolypurinetract, A-G-G-G-A-G-G-G-G-G-A, which may represent the position of the
primerforsynthesisof thesecondstrand of viralDNA. An A+T-rich region, upstream from the RNAcapping site, and the se-quence A-A-T-A-A-A are present within the direct repeat se-quence.These sequences mayserve as apromotersiteandpoly(A) addition signal, respectively, as proposed for other eukaryotic
transcriptionunits.
Threemajor types of virus-specificDNAhave been identified
incells after infection withretroviruses:linear duplex (form III)
DNA, the initial product of synthesis by RNA-directed DNA
polymerase; covalently closed circular (form I) DNA, derived from linearprecursors;andDNAcovalently integratedintothe hostgenome(proviral DNA) (1, 2).
Physical maps of the unintegrated and integrated forms of
aviansarcoma virus(ASV)DNAhave helpedtoclarify the
struc-turalrelationshipsamongthese forms (cf. Fig. 1). Form IIIand proviral DNA are coextensive withasubunit oftheRNA ge-nome(10-13). These forms alsocontain alongterminal repeat
(LTR) of about300basepairs(bp)that is not present in the RNA genome (10-13). The repeated domain is composed of se-quences unique toeach end of theRNA(U5andU3)joined by
ashortsequence, R, thatispresent as aterminalrepeat inthe
RNA(7).Twoprincipal classes of circularDNAhavebeen char-acterized (10, 11). Oneclassbearsasinglecopyof theLTR se-quence and presumablyarises by recombination betweenthe LTRsatthe ends of linearDNA.Thesecond classcontains two
copies of the LTR sequence andprobably results fromfusion of
theends of linear DNA. The site at which sequences from op-posite ends of linearDNAhave been unitedwill be referred to
asthe circle junction site.
Numerous uncertainties remain concerningthe mechanisms ofsynthesis, circularization, and integration of retroviralDNA.
Oneapproach to these issues is a detailed examination of the structureof the various forms of viral DNA. In this report we presentdata on thenucleotide sequence at the circle junction sitefromtwomolecularlycloned ASV DNA molecules. A com-parison of the sequences at the circle junction site of the two clones indicates that circularization probablyoccurredby dif-ferent mechanisms in these cases. The sequence at one of the circle junction sites is arrangedin apattern ofdirect and in-verted repeats reminiscent of bacterial transposons. Regions important for thesynthesis and transcription of viral DNA are also apparentintheLTRandflankingsequences.
MATERIALSANDMETHODS
Materials. [a-32P]dNTPsand[y-32P]ATP (3000 Ci/mmol; 1
Ci=3.7 x
1010
becquerels)
werepurchased
fromNewEngland
NuclearandAmersham/Searle; restriction endonucleases and
polynucleotide kinase were from New England BioLabs and wereusedasdescribedbythesupplier;acrylamideand meth-ylenebisacrylamidewereelectrophoresis gradefrom Bio-Rad; reversetranscriptase(RNA-dependentDNApolymerase)from avianmyeloblastosisvirus wasthe kindgiftofJ. Beard(Life
Sci-ences,St. Petersburg, FL).
MolecularCloningofASV DNA. Theisolation and charac-terizationofthe SRA-1 andSRA-2clones ofASVDNA inthe (A)gtWes(A)B vectorhavebeen described(14). Subclones of
re-striction endonuclease fragments of SRA-2, inserted into the plasmid pBR322, were also used in this
study.
One subelone (pEcoRI-D), wasgenerated by
cleaving SRA-2 with EcoRI, thenisolating the330-bp fragmentspanning the circle junctionsite. The othersubclone (pPvu II-DG)contains a Pvu II
frag-ment, also spanning the circle junctionsite(R.
Parker, personal
communication). The amplification of all recombinant DNA
molecules was doneinaccordance with the NationalInstitutes
ofHealthGuidelinesforRecombinantDNAResearch.
Determining Sequences of ViralDNA. Sequence determi-nations weredoneby using the chemical cleavagemethod of
Maxamand Gilbert(15). DNAfrom both the
phage
andplasmid
vectors wasused.
Abbreviations: bp, base pair(s); ASV, avian sarcoma virus; LTR, long terminal repeat; U5, the sequence in the LTR in viral DNA that is unique tothe5' end of the viral RNA; U3, the sequence in the LTR thatisunique tothe 3' end of viral RNA;R, the sequence in the LTR presentatboth ends of viral RNA.
124
Proc.Natl.Acad. Sci. USA 78 (1981) 125
RU gag pol env src U, R
5'CapLL -A-A-A3'
(-)PB
U3 R U5 p@ env src U3 RU5
a 330R 1L
iR Circle junction
U3 RU5 U3
Soc = MMMMMWq17~7
b
330L RU5 (-)PB
IIiJ=6'#=gag
(-3PB/\
Circle junction (-)PB (-)PB
U RU U, RUU
U,RU, f U3 RU5
(-)PB
c
FIG. 1. Outline of ASVDNAsynthesisand integration. (a)
Ge-netic mapof the RNAgenomewiththe regions relevanttoviral DNA synthesis.gag,pol,andenv arethe viralgenesrequiredfor replication andsrc isthe viraltransforminggene(3).A 16- to21-nucleotide ter-minal repeat is present inthe RNAand thissequence isdenotedR
(4-6).Thestippled boxrepresentstheU5 region,definedastheregion
between thetRNATrPbindingsiteand theR sequence atthe5'end of the RNA(7).Thehatched boxrepresentstheU3region,definedasthe
sequence atthe3'end oftheRNA(excluding R) thatisduplicated during the generationof the terminal repeat intheDNA (7). The tRNA"rPbindingsite islabeled (-)PB, the bindingsitefortheprimer
of the first(minus)strand of viralDNA(8, 9).InthisdiagramtheU5,
R,andU3regions areshown on anexpanded scale. (b) Structureof
unintegrated duplex linearDNA.The linearDNA is coextensivewith theRNA but contains a LTRcomposed oftheU3,R,andU5regions (10, 11). (c)Structures of thetwoforms of circularDNA.Thetwoforms differ by thepresenceofone or twocopiesof theLTR sequence (10, 11).
Thesitewhere theends of the linearDNA arefusedisreferredto as
the circlejunction.(d) The integrated(proviral)DNA is also coexten-sivewiththe RNA andcontains the LTR sequence ateachboundary withhostsequences(12, 13).
RESULTS
Strategy for Determining the Sequence of the Circle
Junc-tionRegion. Wehaverecently cloned circularASV DNA
iso-lated from cells infected with theSchmidt-Ruppin Astrainof
EcoRI
HinfI A* A A
Hpa II A'*
Sau3AI * * A* A
FIG. 2. Strategy fordetermining the nucleotidesequenceofthe
cir-clejunction region ofSRA-2.Aphysicalmapofthecirclejunction region
withtherestrictionendonucleasesitesof fourenzymesusedinDNA
sequencingisshownrelativetothe regions illustrated inFig.1.The
nucleotidesequences infragments ofDNAgenerated byrestriction
en-donucleasecleavageweredeterminedby using the chemicalcleavage methoddescribed byMaxamand Gilbert(15).Thearrowsindicate the direction ofsequencingfrom thevarious sites.Thenumberingsystem isdescribedinthe textandinthelegendtoFig.3.
ASV(14). Twoclones, SRA-1 and SRA-2, were chosen for study becausethey appeared from restriction endonuclease mapping to contain twocomplete, or nearly complete, copies ofthe LTR
sequence. These two clones were amplified and subcloned for further mapping and nucleotide sequenceanalysis.
Asequence determination strategy utilizing restriction en-donuclease sites around the circle junction site of SRA-2 is
shown inFig. 2. This figure is drawn so that the region from the right end oflinear DNA appears to the left ofthe circle junction; theregionfrom the left endistotheright of thejunction. The
sequencedetermined for 350 nucleotides on either side of this site ispresentedinFig. 3. A summaryof restriction endonucle-ase siteswithin thissequence is available upon request.
The Circle Junction is the Centerofa330-bp Direct
Re-peat. The most obvious feature of the sequence showninFig. 3 isthepresence ofa330-bp perfect direct repeat. We have ori-ented therepeatwithrespecttothe ends of the linear DNAby restriction endonuclease mapping (14). One copy of the re-peated domain, numbered IR to330R, isthecopyproximalto
-20R -lOR CTTTTGCATA GGGAGGGGGA GAAAACGTATCCCTCCCCCT
>______________ 20R 30R 40R 5OR 60R 70R 80R 90R 10OR 11OR
AATGTAGTCT TATGCAATAC TCTTGTAGTC TTGCAACATG GTAACGATGA GTTAGCAACA TGCCTTACAA GGAGAGAAAA AGCACCGTGC ATGCCGATTG GTGGAAGTAA
TTACATCAGA ATACGTTATG AGAACATCAG AACGTTGTAC CATTGCTACT CAATCGTTGT ACGGAATGTT CCTCTCTTTT TCGTGGCACG TACGGCTAAC CACCTTCATT
120R 130R 140R 15OR 160R 170R 180R 190R 200R 21OR 220R
GGTGGTACGA TCGTGCCTTA TTAGGAAGGC AACAGACGGG TCTGACATGG ATTGGACGA CCACTGAATT CCGCATTGCA GAGATATTGT ATTTAAGTGC CTAGCTCGAT
CCACCATGCT AGCACGGAAT AATCCTTCCG TTGTCTGCCC AGACTGTACC TAACCTGCTT GGTGACTTAA GGCGTAACGT CTCTATAACA TAAATTCACG GATCGAGCTA
U3 <I> R <1>U5 26pR 27pR 280R 290R 300R 310R U5 Circle
ACAATAAACG CCATTTGACC ATTCACCACA TTGGTGTGCA CCTGGGTTGA TGGCCGGACC GTTGATTCCC TGACGACTAC GAGCACCTGC ATGAAGCAGA AGGCTTCATT
TGTTATTTGC GGTAAACTGG TAAGTGGTGT AACCACACGT GGACCCAACT ACCGGCCTGG CAACTAAGGGACTGCTGATG CTCGTGGACG TACTTCGTCT TCCGAAGTAA
Juxction U3 20L 3CL 40L 50L 60L 70L 80L 90L 100L 1lOL
AATGTAGTCT TATGCAATAC TCTTGTAGTC TTGCAACATG GTAACGATGA GTTAGCAACA TGCCTTACAA GGAGAGAAAA AGCACCGTGC ATGCCGATTG GTGGAAGTAA
TTACATCAGA ATACGTTATG AGAACATCAG AACGTTGTAC CATTGCTACT CMTCGTTGTACGGAATGTT CCTCTCTTTT TCGTGGCACG TACGGCTAAC CACCTTCATT
120L 130L 140L 150L 160L 170L 180L 190L 200L 210L 220L
GGTGGTACGA TCGTGCCTTA TTAGGAAGGC MCAGACGGG TCTGACATGG ATTGGACGAA CCACTGAATT CCGCATTGCA GAGATATTGT ATTTAAGTGC CTAGCTCGAT
CCACCATGCT AGCACGGAAT AATCCTTCCG TTGTCTGCCC AGACTGTACC TAACCTGCTT GGTGACTTAA GGCGTAACGT CTCTATAACA TAAATTCACG GATCGAGCTA
U3 <|> R <I> U5 260L 270L 280L 290L 300L 310L U5 <
ACAATAAACG CCATTTGACC ATTCACCAC TTGGTGTGCA CCTGGGTTGA TGGCCGGACC GTTGATTCCC TGACGACTAC GAGCACCTGC ATGAAGCAGA AGGCTTCATT TGTTATTTCC GGTAAACTGG TAAGTGGTGT AACCACACGT GGACCCAACT ACCGGCCTGG CAACTAAGGG ACTGCTGATG CTCGTGGACG TACTTCGTCT TCCGAAGTAA
-10L -20L
TGGTGACCCC GACGTGATAG
ACCACTGGGG CTGCACTATC
FIG. 3. Sequence surroundingthe circle junction site. The sequence determined for 350 nucleotides on either side of the circle junctionis illus-trated. Therepeateddomain isnumbered 1 to 330, starting with theboundaryintheU3 region. The two copies oftherepeateddomain aredistinguished byan L(left)or R(right) followingthe number.Leftandrightrefer to theends ofthe linear DNA as presented in Fig. 1. Note thattheregionfromthe
left end of linear DNA is showntotherightof the circle junction site and that the region from the right end of linear DNA is to the left of the circle junction site.Negativenumbers refer to sequencesflankingtherepeated domain, numberingfrom the boundaries.
-\\ - gag put 77
Biochemistry:
Swanstrometal.nnl
src andpresumably represents theLTRsequence from the right endof the linearDNA.The othercopyof therepeated domain, numbered iL to 330L, is thecopyproximaltogag and presum-ably represents theLTRsequence fromthe leftend of thelinear DNA. Theflanking unique sequencesare denoted with nega-tivenumbers, startingateachboundaryof the directrepeat.
Portionsof the sequence within the repeated domain can be recognizedbycomparison to previous sequencedetermination studies. The U5 region (see Fig. 1) is present at positions 251 to330.There are a total of sevennucleotidesubstitutions in this regioncompared to the sequence of the Prague C strain of virus used in previous cDNA sequencedeterminationstudies (4, 5). The R sequence, positions 230 to 250, represents the short re-dundancy at the ends of the genome RNA (4-6). The U3 region coverspositions 1 to 229. The sequences of portions of the U3 region have been determined previously by using oligo(dT)-primed cDNA transcripts from the 3' endof the RNAgenome (D. E. Schwartz and W. Gilbert, personal communication) and by usingcloned cDNAs synthesizedwith ASV mRNA as
tem-plate (16). The portionofthesequence from 170Rto -20R has also beendeterminedbythedideoxymethod,usingasubclone oftheSRA-2 clone astemplate (17).
Wehaveidentified the circle junction sitebyassuming that the first nucleotide of the U5 sequence should represent the
extremerightend of linear viralDNA(seeDiscussion). Onthe basisof this assumption the circle junctioncanberecognized
as the center of the direct repeat because the entire U5
se-a
310R 330R iL 20L ..CATGAAGCAGAAGGCTTCATT AATGTAGTCTTATGCAATAC..
*_G
..AATGAAGCAGAAGGCTTCATT ..CATGAAGCAGAAGGCTTCATT b
SRA-2
Circle Junction
Prague C U5
B77 U5
310L 330L -1L -20L LTR Flanking ..CATGAAGCAGAAGGCTTCATT TGGTGACCCCGACGTGATAG.. Sequence: (-)PB
4-ACCACUGGGGCUGCACUAmAG.
-20R -1R 1R 20R
..GCTTTTGCATAGGGAGGGGGA AATGTAGTCTTATGCAATAC..
.TTTAAAAAGAAAAAAGGGGGA AATGCCGCGCCTGCAGCAGC..
d
200 220 U3
.GAGATATTGTATTTAAGTGCCTAGCTCGATACAATAAAC
240 260
GCCATTTGACCATTCACCACATTGGTGTGCA..
1> R A A1
3' Terminal Trp
Sequence: tRNA
LTR Flanking
Sequence: (+)PB?
U3 Boundary:
MMTV
Possible Control Regions
FIG. 4. Summary of the ASV DNAsequenceinthe circle junction
region.Thenumbering isasdescribed in the legendtoFig. 3. In each
casetheplus strandsequenceisshown.(a) Thesequenceatthe circle junction site is comparedtoaportion of thepublished U5sequences
oftwoclosely related viruses (4, 5). Thearrowdenotes theinverted
repeatpresent atthe circle junction. The underlined nucleotidesare
mismatched in therepeat.(b)Thesequenceatoneofthe boundaries
ofthe directrepeat(proximaltothegaggene) andthe flanking se-quence arecompared to the3' terminalsequenceof chicken tRNATrP
(8). Thehomology withtRNATrPidentifies this region of viral DNA
ascoding for the primer binding site, (-)PB. (c) Thesequenceatthe otherboundary of the directrepeat(proximaltothesrcgene) and the
flankingsequencecontainingthe polypurinetractarecomparedtothe
sequence at the equivalent site in murine mammary tumor virus (MMTV) DNA (18, 19). Theboundarymayrepresentthesite ofpriming
of the second strand of viralDNA, (+)PB. (d) Sequenceswithinthe LTR sequencethatmaybeinvolved in control of transcription are
underlined and described in thetext.The asterisk denotesthe nucleo-tide adjacentto the m7Gcapinthegenome RNA(4,5, 20-22). The
vertical arrowheadsshow thepositionsof thepolyadenylylationsites
ingenomeRNA(6).
quence is present(251Rto330R), terminatingat the center of
the direct repeat(Fig. 3). The relevant portions of the U5
se-quencesfrom the Prague C and B77 strains of ASV (4, 5) are comparedtothesequencefrom thecloned DNA, and allgive anequivalentsequence (Fig. 4a). It would appear that
circu-larization occurred without loss of information from at least the right end of the linearDNA. The sequence starting with po-sition IL must therefore represent information from the left end, perhaps the exact left end, of the linear DNA (see Discussion).
Anexaminationof the circlejunctionsite revealsthat it isnot
onlythe center of the330-bpdirectrepeat butalso the center of a15-bp imperfectinvertedrepeat, orpalindrome, with 12 out ofthe15nucleotidesformingtherepeat(Fig. 4a;316Rto330R
and LL to15L). Becausethecirclejunctionsite isthecenterof the directrepeat, thesequencesrepresentingtheinverted
re-peatarealso presentatthe boundaries of thedirect repeat (IR
to 15Rand316L to330L) (Fig. 3).
SequencesFlankingthe DirectRepeat. Theprimer for syn-thesisof the first strand of viralDNA isacellulartRNATrPwhich
isbound to thegenome 101nucleotidesfrom the 5' end ofthe
viralRNA(4, 5, 8, 9). Flanking the right boundary of thedirect
repeat, proximal to thegag gene, is an 18-nucleotide stretch
havingperfect homology to the 3' end of chicken tRNATrP (8) (-1L to -18L; Fig.4b). The regionofhomologyterminates at thepositionof a m'A in thetRNATIP sequence.
The leftboundaryof the directrepeat, proximal tothesrc
gene, may representthe siteof initiationofsynthesisof the sec-ondstrand of viralDNA("plus strong stopDNA,"see
Discus-sion). The sequenceflankingthisboundaryshouldthen
repre-sentthepositionof theprimerforplus strong stop DNA. The
leftboundary of the direct repeatis flanked byapolypurine tract, A-G-G-G-A-G-G-G-G-G-A, inthe plus strand (-lRto -llR; Fig.4c). It islikely that thispolypurinetractrepresents
aportion of the sequenceof theasyetunidentified primerfor
thesecond strand of viralDNA.
Transcriptional Control Regions Within the LTR
Se-quence. The LTRsequence mayserveregulatoryfunctionsin
thesynthesis and processing ofviralRNA(7).Anexaminationof the LTRsequence suggeststhatcontrolregionsarepresent. (i)
AnA+T-rich region25-30nucleotides upstream from theRNA
cappingsitemayserveapromoter-likefunctioninthesynthesis
ofeukaryoticmRNAs(23). Thesequenceonthe 5'side of the ASV RNAcappingsite contains anA+T-richregion (T-A-T-T-T-A-A) starting24nucleotidesupstreamfrom thecappingsite
(Fig. 4d). (ii)The sequenceA-A-T-A-A-AappearsintheLTR
domain(Fig. 4d), and this sequence has beenimplicatedas a
poly(A)addition signal in eukaryoticmRNAs (24). Seventeen to 22 nucleotides downstream from this sequence are three poly(A)addition sitesinviralRNA(6).
ACircle JunctionSequence LackingPortionsof U3andU5. Weexamined the nucleotide sequenceatthe circlejunctionsite ofanother clonedmolecule, SRA-1. Previousmapping
exper-iments had shown that SRA-1 has ashortdeletion within the
LTRsequence (14). Further mapping experiments placedthe deletionatornearthecircle junctionsite(datanotshown).The nucleotide sequence of the circle junction site of SRA-1 is
shown in Fig. 5. By comparisontothe SRA-2sequence(Fig. 3)itcanbe seen thatthereis adeletionatthe circle junction site, with2nucleotides missing from the U5 sequence and 61
nucleotides missing from the U3 sequence. The mapping data indicate that the remainder of each copy of theLTRsequence
isintact. Because wehavenotobserved similar deletionsduring propagation ofthe SRA-2cloneineitherthephageorplasmid vector,we assumethat thedeletion presentin SRA-1occurred
Proc.Natl. Acad.Sci. USA 78 (1981) 127
320R 328R 62L 70L
Biochemistry:
Swanstometal...TGAAGCAGAAGGCTTCA GCCTTACAAGGAGAGA,...
U5< >U3
FIG. 5. Sequenceatthe circlejunction siteof SRA-1. The number-ing isthesame asfor the sequence-determined by usingtheSRA-2
clone (Fig.3).
DISCUSSION
Thecirclejunction siteof the clonedASV DNAmolecule (SRA-2)isthecenterofa330-bp directrepeat. Webelieve thedirect
repeatinthe circularDNAaroseby fusion of the ends of linear
viralDNAand that the directrepeat represents the complete
copiesof the repeated domainscomposingthe terminalrepeat
(LTR)attheends of the linearDNA. The circlejunctionisalso the center ofa 15-nucleotide-long inverted repeat that
con-cludes the LTR. The overall sequence organization of ASV
DNA-a unique sequence (here the ASV coding region),
flanked byadirectrepeat(the LTR),withtherepeated domains concluded byashortinvertedrepeat-isanalogoustothe struc-ture ofcertain prokaryotic and eukaryotic transposable ele-ments (25-28). It is tempting to speculate that homologous structures have been selected asthe consequenceofan event commontothe replication ofretrovirusesandthe mechanism of
transposition(for example, theinsertion ofDNAintonewsites
inthe host genome). However, at present thereare nodatato suggest any common mechanism utilizing these homologous
structures.
Theinterpretation that the directrepeatcentered atthe cir-clejunction siterepresentstheentiresequencewithin theLTR
is based upon the presence of theentire U5 sequence at the circlejunction site. Studiesin vitroandin vivosuggestthat the right end of the linearDNAisdefined by theregion
immedi-ately adjacent to the tRNATIP binding site, the U5 sequence
(refs. 3, 10, 29,and30; Fig. 1).TheentireU5sequenceis
pres-ent at the circle junction site, suggesting that circularization occurred without loss of information fromatleast the right end, and, by inference, from either end of the linearDNA.
Three otherobservationsarguethat theentiresequencefrom
the left end of linear DNA ispresent at the circlejunction in SRA-2 DNA. First, restrictionendonuclease mappingoflinear DNA (10) places the left end 150 to 200 bp to the left of the
EcoRIsite atposition 178 in the U3region. Second, the
pres-enceof invertedrepeats atthe ends of theDNAgivesstriking symmetry tothe structure of viral DNA. Inverted repeats of various lengths between 3 and 21 nucleotides have been ob-servedatthe circlejunctionofacloned MuLV DNAmolecule
(31) and at the host-virusjunction of several cloned proviral DNAs(18, 19,32-34).Thethird relevant observation also con-cernsthestructureofproviralDNA.Nucleotidesequence
anal-ysesof thejunctionsof hostand proviralDNAhaveshown that
theprovirus isusuallymissingtwobasepairsfrom theU5region (16, 18, 19, 32, 33, 35); theseanalyses have also shown that the lasttwo nucleotides of viralsequenceatthe left end of proviral
DNAareT-G(18, 19,32-34). Assumingthattwobasepairsare
lost from each end of viral DNA duringintegration, then the
left end of linearDNAistwobasepairsfrom this dinucleotide.
ASVmayfollowthispattern, sinceaT-Gdinucleotide appears
intheU3sequence,twobasepairsfrom the circlejunction site.
Thusweinfer that thesequencesattheexactends of linearASV
DNA are represented by the sequence composing the circle
junction site intheSRA-2 molecule.
TheLTR, like several transposable elements(25), maycarry
signals involvedin transcription (Fig. 4d). Sequences present within the LTRunitprobably affectinitiation oftranscription
of viral genesand signal
polyadenylylation;
they may also oc-casionallypromote transcriptionofflanking cellularDNA(ref.36; G. Payne, personalcommunication).
There are three regions within the sequencepresented in Fig. 3 that are relevant to viral DNAsynthesis:theRsequence and theregionsflanking either side of the directrepeat.
(i) We have noted that thesequencerepresentingthe short terminalrepeat in genomic RNA is presentonly onceinthe LTR
sequence in DNAsynthesizedin vitroorinvivo(37). This
ob-servation conforms to the model for DNA synthesis in which
a DNA copyof the R sequence, from the 5' endof the RNA
genome, base-pairs with the R sequence at the 3' end of the
RNA to allow DNA synthesis tocontinuebeyondthe 5' end of the template (7).
(ii) The regions flanking either side ofthedirectrepeat prob-ablyrepresent sequencesrelatedtotheprimersforthefirst and
second strands of viral DNA. Immediately flanking the right boundary of the direct repeat, proximal to gag, is an 18-nu-cleotide stretch homologoustothe3' terminusoftRNATIP, the
primer for synthesis of the first strand of viral DNA(8). The
length of homology between the viral sequence and tRNATrp
that we have detected extends slightly thelengthofhomology
previously throught to bepresent, as determined by RNA
se-quencing (38, 39). The regionof homologyterminateswith the
appearance of the first modifiednucleotide, m'A, in the tRNA.
(iii) The major species ofplus strand DNA identified in
in-fected cells is about 300 nucleotides long and is synthesized
using the right endof the minus strand (the U3-R-U5 region)
as template (29). This specieshasbeenreferred to as plus strong
stop DNA (40). A model has beenproposed inwhich the minus
strand is completed by copyingplus strongstopDNA (10, 40).
Inthiswaythe U3-R-U5 regionthatis presentatthe right end
of the minus strand is duplicated atthe leftend, forming the
second copy of the LTR sequence. InthismodelforDNA
syn-thesis the site at whichplusstrongstopDNA is primed defines
the boundary of the U3regionintheLTRsequence. Using the
circle junction site to define theboundaryofthe U3region, we
find a polypurine tract, A-G-G-G-A-G-G-G-G-G-A, intheplus
strand at the site expected for theprimer forplus strongstop
DNA. A polypurine tract 21 nucleotides long ispresent in an
equivalent position in murine mammary tumor virus DNA,
with11nucleotides at the U3boundarybeingidenticalbetween
ASV and murine mammary tumorvirus (Fig. 4c; refs. 18 and
19). It is likely that this polypurine tractrepresents a portion
of the sequence of the primer forplus strong stop DNA,
al-though we do not yetknowthe nature ofthat primer.
Our data indicatethatcircularizationcan occur by more than
one mechanism. In addition to circular DNA molecules that
appear to contain eitheroneortwocopies of the LTRsequence
(10, 11, 14), we have identified a molecule that contains two
copies of theLTR sequence, eachofwhich hassuffered a
dele-tion. The lossoftwobasepairs from the U5 region isanalogous
to the absence of thetwoequivalentbase pairs ofother
retro-viruses in proviral DNA (16, 18, 19,32,33, 35). For thisreason we suspect that the DNAmoleculecarrying thedeletionatthe
circle junction sitearoseby recombination involvinga
mecha-nism similartointegration. Shoemakeret al. (31) have also en-counteredunusualcircularmoleculesof murine leukemiavirus DNAcontaining rearrangements that theyinterpret asarising
by integration of one region of the DNA into another region
of the DNA. Those authors emphasize models in which the
rearrangementofviral DNA followscircularization. Inthecase we haveexamined it issimpler to imagine that oneend of linear
fromthe otherend. Ju etal. (41) and Highfieldetal. (42)have observed another class of circularDNAmolecules;mostof the molecules they foundare notsimilar toeitherSRA-1 or SRA-2butcontainvariabledeletionsatthe circlejunction sitewithin theU3and U5regions. Itseemslikely that circular viralDNA
molecules found in vivo are a heterogeneous population of
structures arisingbyseveral mechanisms.
WethankJ. Majorsformanyhelpful discussions,J. Majors, R.
Par-ker, D.Schwartz, and G. Payneforcommunicationof results prior to publication, andH. MartinezandP. Czernilofsky forassistancewith
computeranalysis of thesequence. Wealso thankJ.Migneault for
ex-cellent stenographicassistance.This work wassupported byU.S.Public HealthServiceGrants CA 12705andCA19287,TrainingGrant IT32 CA 09043, andAmericanCancerSocietyGrant VC-70. W.J.D. holds afellowship from the LeukemiaSocietyofAmerica.
1. Weinberg,R. A. (1977)Biochim. Biophys.Acta473,39-56. 2. Shank, P. R. &Varmus, H. E. (1978)J.Virol.25,104-114. 3. Bishop,J. M. (1978)Annu.Rev.Biochem.47,35-88.
4. Shine, J., Czernilofsky, A. P., Friedrich, R., Goodman, H. M. &Bishop,J. M.(1977)Proc.Natl Acad.Sci. USA74, 1473-1477.
5. Haseltine, W. A., Maxam, A. &Gilbert, W. (1977)Proc.Natl Acad. Sci.USA74,989-993.
6. Schwartz,D.E.,Zamecnik,P.C. &Weith,H.L.(1977)Proc.Natl. Acad.Sci.USA74,994-998.
7. Coffin,J. M.'(1979)J. Gen.Virol42, 1-26.
8. Harada,F.,Sawyer,R.C. &Dahlberg, J.D.(1975)J.BioLChem. 250, 3487-3497.
9. Taylor,J. M. &Illmensee,R. (1975)J.Virol. 16, 553-558. 10. Shank, P. R., Hughes, S. H., Kung,H..J., Majors,J. E.,
Quin-trell, N., Guntaka,R. V.,Bishop, J. M. &Varmus, H. E.(1978) Cell15, 1383-1395.
11. Hsu, T. W.,Sabran,J. L., Mark,G. E.,Guntaka,R. V. &Taylor,
J. M.(1978)J. Virol28,810-818.
12. Hughes, S. H., Shank, P. R., Spector, D. H., Kung, H.-J., Bishop, J. M., Varmus, H. E., Vogt, P. K. &Breitman, M. L. (1978) Cell15, 1397-1410.
13. 'Sabran, J. L., Hsu, T. W.,'Yeater,C., Kaji, A., Mason,W.S.&
Taylor,J. M. (1979)J.ViroL 29, 170-178.
14. DeLorbe,'W.J., Luciw, P. A.,Goodman,H.M., Varmus,H. E. &Bishop,J. M.(1980)J.Virol 36,50-61.
15. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol 65, 499-560.
16. Yamamoto, T., Tyagi, J. S., Fagan, ,J. B., Jay, G., deCrom-brugghe, B.&Pastan, I. (1980)J.Virol.35,436-443.
17. Czernilofsky,A.P.,DeLorbe,W., Swanstrom,R.,Varmus,'H.E., Bishop,J. M.Tischer,E. &Goodman,H. M.(1980)NucleicAcids Res.8, 2967-2984.
18. Majors, J. E. &Varmus, H. E. (1980)inAnimalVirusGenetics,
ICN-UCLA Symposia on Cellularand Molecular Biology, eds. Fields, B.,Jaenisch, R.& Fox, C. F. (Academic, NewYork),in press.
19. Majors, J. E. &Varmus, H. E.(1980)Nature(London),inpress. 20. Beemon, K. L. &Keith,J. M. (1976)inAnimalVirology,
ICN-UCLA Symposia onCellular and Molecular Biology, eds.
Balti-more, D., Huang, A. & Fox, C. F.(Academic,NewYork),Vol.4, pp. 97-105.
21. Furuichi,Y.,Shatkin,A.J., Stavnezer, E. &Bishop,J. M.(1975)
Nature(London) 257,613-620.
22. Keith,J. &Fraenkel-Conrat,H.(1975)Proc.Natl Acad.Sci.USA 72,3347-3350.
23. Ziff,E. B. & Evans, R. M.(1978)Cell15,1463-1475.
24. Proudfoot,N.J. &Brownlee,G. G.(1974)Nature(London) 252, 359-362.
25. Calos, M. P. &Miller,J. H.(1980)Cell 20,579-595.
26. Farabaugh, P. J. & Fink, G. R. (1980) Nature (London) 286, 352-356.
27. Gafner,J. &Philippsen,P.(1980)Nature(London)286,414-418. 28. Finnegan, D. J.,Rubin,G. M., Young, M.W.&Hogness, D. S.
(1978) Cold Spring Harbor Symp.Quant. Biol. 42, 1053-1063. 29. Varmus, H. E.,Heasley,S., Kung, H.-J., Oppermann, H.,Smith,
V. C.,Bishop,J. M. &Shank,P.R.(1978)J.MolBiol120,55-82.
30. Taylor,J. M.,Illmensee, R.,Trusal, L. R. &Summers, J.(1976)
inAnimalVirology,ICN-UCLA Symposia onCellularand Molec-ular Biology, eds.Baltimore, D., Huang, A. & Fox, C. F. (Aca-demic,NewYork), Vol.4, pp. 161-174.
31. Shoemaker,C., Goff, S.,Gilboa, E.,Paskind, M., Mitra, S. W.
&Baltimore,D.(1980)Proc.NatlAcad.Sci. USA77,3932-3936. 32. Dhar,R., McClement, W. L.,Enquist, L. W. &VandeWoude,
G. F.(1980)Proc.NatlAcad.Sci. USA77,3937-3941.
33. Shimotohno, K., Mizutani, S. & Temin, H. M. (1980)Nature (London)285,550-554.
34. VanBeveren,C.,Goddard,J.G., Berns, A. & Verma, I. M.(1980)
Proc. Natl. Acad. Sci.USA77, 3307-3311.
35. Hager, G. L. & Donehower, L. A. (1980)inAnimalVirus Ge-netics,ICN-UCLA Symposia onCellular and MolecularBiology, eds.Fields,B.,Jaenisch,R. &Fox,C. F. (Academic,NewYork), inpress.
36. Quintrell, N., Hughes, S. H., Varmus, H. E. &Bishop, J. M. (1980)]. Mol.BioL,inpress.
37. Swanstrom, R., Varmus, H. E. &,Bishop, J. M. (1980)J. Biol
Chem.,inpress.
38. Cordell,B., Stavnezer,E.,Friedrich,R.,Bishop,J. M.& Good-man, H. M.(1976)J.Virol.19,548-558.
39. Coffin,J. M. &Haseltine, W.A.(1977)J.Mol Biol117,805-814. 40. Gilboa,E., Mitra, S.W., Goff,S.&Baltimore,D.(1979)Cell18,
93-100.
41. Ju,G.,Boone,.L.&Skalka,A.M.(1980)J.Virol33,1026-1033. 42. Highfield, P. E.,Rafield, L.F., Gilmer,T. M. &Parsons, J. T.