Shared Sequence
Variants
of Mus spretus LINE1 Elements Tracing Dispersal
to
Within the Last
1 Million Years
N. Carol Casavant and Stephen C. Hardies
Department of Biochemistry, University of Texas Health Science Center at Sun Antonio, Sun Antonio, Texas 78284-7760
Manuscript received August 13, 1993 Accepted for publication February 9, 1994
ABSTRACT
LINE-1 repetitive sequences contain a record of an evolving population of transposons within the mammalian genome. Of the 100,000 copies of LINE-1 sequences per genome there are many shared sequence variants representing changes occurring within the propagating LINE-1 elements themselves, rather than changes that occur during retrotransposition or after an element inserts in the genome. These shared sequence variants define families of LINE-1 elements which have spread within specific periods of time. We have been interested in studying events in LINE-1 evolution since the speciation of Mus spretus and Mus domesticus approximately 3 million years ( M y ) ago. To do this, we have collected LINE-1 sequences that have shared sequence variants specific to M. spretus. The sampled LINE-1 elements were sequenced at their extreme 3' ends, where the density of sequence variants is highest. The new sequences define six new M. spretus-specific sequence variants. Of these, we have found one that could be used to screen for LINE-1 elements arising in the last
-
1 MV, which we argue is a critical sample for understanding the dynamics of LINE-1 propagation.L
INE-1 elements are repetitive sequences that havebeen dispersed to approximately 100,000 sites in the murine genome via transposition [for reviews see
SKOWRONSKI and SINGER (1986), SCOTT et al. (1987),
EDGELL et al. (1987), HUTCHISON et al. (1989) and DEIN-
INGER et al. (1992)l. The LINE-1 elements are organized
into a small number of families in each murine species
(MARTIN et al. 1985; HARDIEs et al. 1986; RIKKE et al.
1991), with each family characterized by a particular set
of sequence variants. These shared sequence variants are acquired by propagating elements that pass them onto the future generations, whereas other substitutions that are acquired in a pseudogene copy and are not passed onto future generations are called "private" vari- ants. The shared sequence variants reflect the sequence of propagating parental elements, and the different families represent the existence of more than one propagating element. Analysis of the shared sequence variants therefore yields information about the number of propagating elements and how they are related one to another.
Previous studies of shared sequence variants have in- dicated that large families amplified at various points in the past, sometimes splitting into two subfamilies, and sometimes dying out as new families arose. For example, comparisons of 5' ends of murine LINE-1s have indi- cated that new families can arise (referred to as A, F, or
V type LINE-1) that have had their 5' control sequences replaced (ADEY 1991; SCHICHMAN et al. 1992; SCHICHMAN
et al. 1993;JUBIER"AURIN et al. 1992). Anciently derived
families have been described in mouse and rat (PASCALE
et al. 1993), and in Peromyscus (KASS et al. 1992). Split-
Genetics 137: 565-572 (June, 1994)
ting of LINE-1 into multiple families has also been de- scribed in man (SCOTT et al. 1987; JURKA 1989). These comparisons paint a broad picture of the evolution of
LINE-1; however, the details of the process have been
more difficult to ascertain.
In contrast to the large families described above, DOM-
BROSKI et al. (1991) defined a family of only six human
LINE-1 elements that includes a currently active propa-
gating element as judged by having made a copy within a currently living human. Although this family appar- ently dates back 6 million years (Myr)
,
it has contributed only slightly to the human LINE-1 total replicative ca- pacity. The question becomes whether there are many small replicating subfamilies within the umbrella of a large amplifjmg family, or if these six members consti- tute a nonrepresentative family.To address the existence of small subfamilies in the mouse, we will need to sequence a set of LINE-1 ele- ments from the analogous time period for the mouse. The time period in question has been ascertained for the mouse from the analysis of substitution patterns in
LINE-1 coding sequence (HARDIEs et al. 1986). This
work closely analyzed the pattern of substitution in
LINE-1 sequences after they had become separated
from the major propagating lineage, which was called a
molecular driver. The result was that after separating from the molecular driver, LINE-1 sequences appeared to have had a period of about 1 Myr in their history during which their coding sequences were still under selective pressure. The sequence that was under selec- tive pressure could not have been the element that was
566 N. C. Casavant and S. C. Hardies
presumed to be truncated. That means that most ele- ments must have propagating parents in their histories that were separate from the major propagating lineage for about 1 Myr. Those parents only made small families, because you never sample more than one progeny from each parent. All of this means that small families are active in the mouse for about 1 Myr before they either die out or become big families. Therefore, if small sub- families are to be observed while they are still active, sampled elements must be less than 1 Myr old.
If small subfamilies of this kind are active today, they should exist as the youngest components of larger fami- lies that have spread successfully in the last few Myr. So,
to find 1-Myr-old subfamilies, we started by defining a LINE-1 family that is both large and vigorously propa- gating, and therefore would have a suitable number of 1-Myr-old members. A suitable family of about 9000 members has been recently identified in Mus spretus, characterized by the presence of two specific sequence variants named Ms496 and Ms416 (RIKKE et al. 1991;
RKKE and HARDIES 1991). These shared sequence vari-
ants have been acquired and amplified in less than 3-5 Myr, the time since speciation between Mus domesticus and M. spretus (BONHOMME et al. 1984). In this study, we have collected LINE-1 sequences from this family and expanded the list of its characteristic sequence variants, looking to define even younger subfamilies.
We searched for additional shared sequence variants in the Ms496 family by a targeted polymerase chain re- action (PCR) amplification and cloning strategy. To ob- tain the maximum number of sequence differences within a short segment of the LINE-1 element, we chose to sequence the 3‘ most region (adjacent to the poly- adenylation site) for two reasons: (1) this region is the most populous among LINE-1 elements and
(2)
the 3‘-untranslated region has acquired mutations within the propagating elements at a faster rate (RIKKE et al. 1991). In addition, this study extends the characterized region within the M. spretus-specific population first described by RIKKE et al. (1991) by an additional 469 bases for a total of over 900 bases characterized.
The majority of sequence variants observed were spe- cific to M. spretus and were shared among the reported members of the Ms496 family. The relative lack of un- shared mutations was expected since there has been little time after the dispersal of the individual LINE-1 elements for them to acquire private mutations. There- fore, the majority of sequence differences observed within the collected family members define the se- quence of actively propagating LINE-1 elements. A total of six new M . spretus-specific shared sequence variants were identified and ordered within the Ms496 family.
The sequences analyzed to date do not yet tell us the relationship between small and large families. To answer that question, we need to further increase the sample of sequences confined to the last 1 Myr. The current results
will enable us to obtain this sample, because they estab- lish a shared sequence variant that is suitable to use as a hybridization probe to screen for the required 1-Myr- old LINE-1 elements.
MATERIALS AND METHODS
SSLl-PCR A mouse of the M . spretus Spain strain SPRET/Ei was obtained from J. Nadeau (The Jackson Labo- ratory, Bar Harbor, Maine). Mouse DNA was prepared as de- scribed by RIKKE et al. (1991). M . spretus LINE-l elements were amplified with oMS496HIII and cloned as described by CASAVANT and HARDIES (1993). Those filters to be probed with oMs6936 (CATTGCACACACTAG) or oSCHla (TCATTTA- CATTTCCAATGCT) were prehybridized and hybridized as described by RIKKE et al. (1991). Each filter was washed 3 times for 20 min in 6 X SSC at 39” for oMs6936 and room tem- perature for oSCHla. The J’-untranslated region probe (CASA-
VANT and HARDIES 1993) was nick translated, hybridized and
washed as described by WAHL et al. (1979). Filters were ex- posed overnight on Kodak X-OMAT AR film.
DNA sequencing: The individual clones were sequenced using the dideoxy chain termination method (SANGER et al. 1977). Both strands of each reported sequence were com- pleted using internal LINE-1 primers.
Construction of LINEl phylogenetic tree: Comparative se- quence data from M . spretus and M. domesticus elements were used to construct the tree. All of the elements called M . do- mesticus originate from inbred strains that are actually hybrids between M u s m u s c u l u s m u s c u l u s and Mus musculus domes- ticus (BONHOMME and GUENET 1989). M . domesticus sequences were LlMdA2 (LOEB et al. 1986), LlMdA13 (SHEHEE et al.
1987), R5 (GEBHARD et al. 1982), LlMdl and L1M4 (VOLIVA et
al. 1983, with corrections from SHEHEE et al. 1989), and LlMdl7 (SHEHEE et al. 1989). The M . spretus sequences were SPR48,49,68 and Dl3SHl (CASAVANT and HARDIES 1993), and SC2, SC3, SC4, SC5, SC6 and SC7 (this report). The tree was constructed using the method of maximum parsimony (FITCH 1977). There were minimal discordancies; those causing un- certainty in placement of branches are explicitly discussed in the results.
The sequence of the LINE-1 ancestor at the time of the
M . spretus/M. domesticus speciation was estimated for display purposes only. It is essentially a consensus after the removal of species-specific variant bases.
RESULTS
Collecting M . spretus LINEl elements: Members of
the Ms496 family in both M . spretus and in an inter- specific hybrid mouse were recovered by subfamily spe- cific LINE-1 PCR (SSL1-PCR) (CASAVANT and W I E S
1993). In this method, a PCR primer specific for the defining Ms496 sequence variation in the family is paired with the “bubble” anchor of RILEY et al. (1990) located within the flanking DNA. This allowed collec- tion of sequences from the 3’ most 489 bp of the ele- ments, which is the richest region for informative se- quence variants (RIKKE et al. 1991).
T I ( U O L C T O C ~ T A T C U C C ~ T M ~ ~ ~ ~ - T - ~ ~=STOR A C T
...
T..T...A...T...G...T...A...G.. L W l...
T...T...C...G.. SCS...
...
T...C...,~ ~ ~ 6 ~ ~ 0 ~ ~ ~ ~ ~ ~ 6 ~ g 0 ~ ~ ' " ' 6 9 0 0 " " " 6 9 ~ ~ ~ ~ ~ ~ ~ 6 g l O ~ ~ ~ ~ ~ ~ 6 g ~ o ~ ~ ~ ~ ~ ~ 6 9 3 O ~ ~ ~ ~ ~ ~ 6 9 4 O ~ ~ ~ ~ ~ ~ 6 9 5 O ~ ~ ~ ~ ~
RS
...,....
...
~ ~ g ~ o ~ ~ ~ ~ ~ ~ ~ g ~ o ~ ~ ~ ~ ~ ~ ~ g ~ ~ ~ ~ ~ ~ ~ ~ ~ g ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 7 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
" X A Q M U T ~ T A T W C G G Q G C C T ~ ~ A
...
A.T...T..A...r......
A . . . , . . .
...
A...cT...T.....,... ....
A...T.A.A....
T...:::::::
N
...
...
...
...
... ...
...
...
...
...
T...A...G......,...
T......
A . . ....
...
...
ATUGATGGA-aB-ACC-T-TCTGCAWCCTAT-
T...G...C...C...
...
A...A..T...A... T..A..T..C...
C......
A . . . C . . . A . . . A . . .T...C...T..T...,...
T...
....
C......
704000000m705or~~# ~ 0 7 0 6 0 ~# 7 1 1 0 # ~ 0 ~ # 1 7 1 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 7 ~ 9 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
:::::::::::
1
...
...
... ...
...
...
...
...
T......
A . . ....
...
1
..._...
...
T......
...
...
...
a....
...
-STOR
R5 L u m l 5c5 5c6 L -1-3 Luldl7 11m4
5pr68
5pr48
a m 4 9
scz
8.27 8C3
013881 5c4
Na2IETOR R 5
L u m l
8C6
scs
L-
1-13 11m17 11m4 SPP.68
5pr48 5pr49
scz
8C7
sc 3 5c4 013881
0 ~ ~ 0 ~ ~ ~ 7 1 3 0 ~ 0 ~ ~ # ~ 7 1 ~ 0 " ~ ~ ~ ~ 7 ~ 5 0 " " " 7 1 6 0 ~ ~ ~ ~ ~ ~ 7 ~ ~ ~ " ' ~ " 7 1 8 0 ' ~ ~ ~ ~ ' 7 1 ~ 0 ~ 1 1 1 1 1 7 ~ ~ ~ 1
C U C I ~ A - T ~ ~ C C C ~ ~ - ~ A =STOR ~ A ~ ~ C T ~ ~ ~ ~
...
A...A...T... R5.,...
A...G.t...A...T...G...+...-...
I,-1..
C...T...T......
C... SC5...
G.....
A...A....T.....
A . . . SC6...
...
...
L-...
...
...
L-17...
L-45pr68
8PR49
5pr48
...
[
...
...
L l m A l 3...
...
SC7...
...
sca...
...
8C3...
...
SC4 Dl3881~ ~ ~ ~ ~ p 1 o ~ ~ ~ ~ ~ ~ ~ ~ p O 1 " " 1 7 ~ ~ O 1 ' ~ ~ ~ 1 7 ~ ~ 0 ~ ~ ~ ~ ~ ~ 7 ~ 5 0 ~ ~ " ' 1 7 ~ 6 0 ~ ~ ~ ~ ~ 1 7 ~ ~ 0 1 ~ ~ ~ ~ 1 7 ~ ~ 0 1 ~ ~ ~ ~
- C C W K A W U 3 A X G C P
...
A...r.T...T...~... R5 ST OR C...T...T...C... LulDl...
A...T.A...G.C..C....T...T...
T...G...T......
8C6 5c5. . .
.. . .
.
.T... .
.
. . .
.. . .
.. .
. . .
d
. . .
. .. . .
.
. . .
L-13...
T......
TA...
C L-...
...
Luldl7C . . . ~
...
C . . . G . . . G . G . . . G . . . W . . ....
L-45pr68
5pr48
5pr49
...
...
...
sca::
i..
. . .
. .
. . . .
. .
. . .
. .
. . . .
.
. . . . .
.
. . .
8C7. . . .
.
. . .
.. . .
...
.. . .
i..
...
. . . .
.
. . .
sc4 sc3013881
~ 7 ~ g 0 " ~ ~ ~ ~ 7 3 ~ ) ~ ~ ~ ~ " 7 3 1 0 " " ' . 1 1 7 0 " " ' ~ 7 3 ~ 0 ~ ~ ~ - ~ ~ ~ 7 ~ ~ 0 ~ ~ ~ ~ ~ - ~ 7 3 5 0 ~ . ~ ~ ~ , 7 ~ 6 0 , .
.-.
T..T....A.G.--...G...G....-...C..-.~... 115P A ~ A T P - Q C T M A T A C C T M T A M ~ F F O R
...
C......
b u m 1 G..C...T.G.MQC.A-..A...Q...-...-M...~. SC6 8C5...
...
...
I,-... -...-...
...
L-3e..
...
A...
N....
...
T.T...
A N..._....
L M 1 7...
A.......
~ l m 45pr68
5pr48
5pr49
..
T...A...C...-...-... 8ca...
...
8.23 sc7...
8.24"""""""""_ """""""""_
Dl3881
N. C. Casavant and
products was probed with an oligonucleotide (oSCHla)
from the extreme 3' end of LINE-1 (position 7337-7356
in Figure 1). Post-hybridization washing was conducted
at low stringency so that mismatches could be tolerated.
Prior to probingwith oSCHla the library was hybridized
to a nick translated fragment containing the 3' end of
the LINE-1 element to determine which clones con-
tained LINE-1 sequence. The sequences of six of the
oSCHla positive LINE-1 elements (SC2-7) are shown in
Figure 1. Each contains a complete 3' end except SC5,
which has a poly(A) tail beginning 91 bp before the usual polyadenylation site. SC5 passed the screen with oSCHla because there is a second LINE-1 element with
an intact 3' end in the immediate flanking region (not
shown).
Defining M . sfwetusspecific variants: To detect M.
spretus-specific sequence variants, the sequences were aligned with six representative LINE-1 elements from M.
domesticus. Three elements, LlMdAQ (LOEB et al. 1986),
LlMdA13 (SHEHEE et nl. 1987) and LlMd4 (Vol.nr~ pt nl.
1983), were previously described as having spread rela-
tively recently in M. domesticus. A fourth element,
LlMdl7 (SHEHEE pt nl. 1989), was chosen because its
sequence compared well with the younger elements.
The remaining two, LlMdl (VOl.rV.4 pt nl. 1983) and R5
(GERHARD et ai. 1982), are known to have arisen shortly
before the speciation between M. spretus and M. do-
mesticus. A phylogenetic tree was derived (Figure 2)
which placed the new M. sprptu.5 sequences within this
known framework. This tree served to segregate the el- ements by common content of sequence variants.
SC2, SC3, SC4 and SC7 are organized as if descended
from a single evolving sequence, although we believe that the evolving sequence actually represents a lineage
of ancestral propagating elements (HARDIES and RIKKE
1990). They contain shared sequence differences as ex-
pected for members of the Ms496 family. Therefore, these sequences were presumably primed specifically by the oMS496 primer, and should be M. spretus-specific inserts since the 496 mutation is known to have arisen after speciation. Sequence variants shared by SC2, SC3,
SC4 and SC7 represent mutations spread during the ex-
pansion of this family, and are candidates for M. spretus-
specific shared sequence variants. The M . spretus and
M. domesticus sequences in Figure 1 are arranged in the
same order as on the tree, and compared to the common
ancestral sequence computed for the M. spretus/M. do-
mesticus split. Candidates for sequence variants defining
M. spretus and M. domesticus-specific subfamilies have
been boxed in Figure 1 and the candidates for M.
spretus-specific variants have been tabulated in Table 1. Four of the candidates (6936, 7265, 7285 and 7024) should definitely be M. spretus-specific by the following
argument. They appear on the tree after SC2, which is
presumed to have carried the 496 mutation, which in turn is known to have arisen after speciation. Further
i'B%
::::I
6953 6970 7037 7055
6888 69 64 7002 7049 7081 7131 7f71 7256 7285 7306 7347
r
1
FIGURE S.-Molecular phylogenetic tree of M . domrslirus and M. sflrrtus LINE-] sequences. Dashed brackets indicate the range of mcertainty in placing those sequences that are truncated, and therefore missing some informative positions. Branch lengths are not drawn to scale. Only shared sequence variants are indicated. The stippled bar, representing the spe- ciation between M. sflrrtus and M. domeslicus, has been ex- tended to encompass shared sequence variants for which spe- cies specificity and branch order is uncertain.
verification is provided by three previously characterized 3' truncated members of the Ms496 family (mMsPCR48,
mMsPCR49 and mMsPCR68; CASAVANT and HARDIES
1993) which are indicated as SPR48, SPR49 and SPR68
on Figures 1 and 2. Only general positions on the tree for these four element5 are shown because the infor-
mative positions 3' of 6987 have been deleted by trun-
cation. However, they clearly arose after 6920 and before
6936. Therefore, by mapping above 6936, they confirm this variant's sprptus-specific character. Also, since 6936 maps above 7024, 7024 must be M. spretus-specific.
The remaining four candidates (6920,7102,7146 and
7224) are shared by all four of SC2, SC3, SC4 and SC7.
These variants therefore, have arisen early in the ances-
tral lineage, and should be characteristic of most mem-
bers of the Ms496 family, as are the defining variants,
Ms496 and Ms416. Whether they came before or after Ms496 can not be determined within this sample of se- quences. Two of them (7102 and 7224) have the dis-
TABLE 1
Candidates for M. sprettrs-specific variants
Base change
Positiona ancestor:variant M . spretus clonesb
6920 C:T SC2, 3, 4, 7, SPR48, 49, 68
6936 T:C SC3, 4, 7, Dl3SHl
7024' T:C
7102d2 e
s c 3 , 4 T:C s c 2 , 3, 4, 7
7146 G T s c 2 , 3, 4, 7
7224' T:C s c 2 , 3, 4, 7
7265 A:G sc3, 4, 7
and D13SHl
7285-7306 22hp:DEL. s c 3 , 4 , 7
a The coordinants are based on LlMdA2 (LOEB et al. 1986).
The clones listed are those that inserted after speciation. Clones SPR49 and Dl3SHl are truncated and therefore do not contain sequence at this site or any other site 3'.
Clones SPR48 and SPR68 are truncated and therefore do not contain sequence at this site or any other site 3'.
e M . spretus specificity of these positions is uncertain, because LlMd4 also contains the putative M . spretus base: at 7102, 13 addi- tional mouse sequences did not contain this variant; at 7224, of 17 additional M . domesticus LINE-1 elements, only LlMd2, an older element, did contain this variant.
other M . domesticus elements (Table 1). LlMd4 had been assigned as one of the youngest M . domesticus el- ements because it has no unshared mutations in a more 5' sequenced segment (MARTIN et al. 1985, HARDIES et al.
1986); and it maps in Figure 2 as part of a M . domesticus- specific family. However, the position of LlMd4 is based on only three positions (6954, 6938 and 6955) that dis- agree with 7102 and 7224. Because LlMd4 is young and relatively free of unshared differences, it should not be expected to be involved in a lot of coincidences, espe- cially two aligning with the same M . spretus subfamily.
So we consider the placement of LlMd4, as well as the character of all five variants in question to be unresolved at present. This region of uncertainty is marked by the wide stippled region in Figure
2.
We provisionally consider the last two candidates (6920 and 7146) to be M . spretus-specific, although con- firmation is lacking. This makes a total of six M . spretus- specific shared sequence variants that are characteristic of the consecutive subfamilies and can be used to order elements with respect to their age. Of these, Ms6936 and Ms7024 are supported the best.
The assignment of variant 6936 as M . spretus-specific conflicts with information derived from the remaining two new sequences, SC5 and SC6. These two elements fell on the tree together with older elements that in- serted into the genome prior to speciation between M .
spretus and M . domesticus. They share sequence variants with LlMdl and R5 at positions 6953,6970,7037,7055 and for SC5 at 7156. These are positions that are known to have changed in the ancestral lineage before the spe- ciation of M . spretus and M . domesticus, and before the expansion of the Ms496 family. SC6 has three discordant variants for this assignment: the M . spretus-specific bases at 6920 and 6936, and the absence of the base charac-
teristic of older elements at 6913. This is too much to explain by coincidence. SC6 could have arisen by in vivo recombination, but there is also an obvious PCR artifact consistent with this pattern. We believe that the oMS496 primer did prime and partially extend on a M . spretus- specific element as intended, but the strand was not completed before it dissociated. This incomplete strand then primed on another LINE-1 element in a later cycle. Thus SC6 is an artificial recombinant between a M .
spretus-specific LINE-1 and an older LINE-1 element. Regardless of the etiology of SC6, we have treated it as two separate entities (SC6 left and SC6 right) in our analyses. In contrast, SC5 appears to truly be an older element on which the oMS496 primer primed nonspe- cifically. SC5 shares a single M . spretus-specific variant at 6936, which we believe is a coincidence. Further in- formation confirming that variant 6936 is truly M .
spretus-specific is presented below.
How to search for small subfamilies: We are seeking
a way to determine if the Ms496 family is split into smaller concurrent subfamilies thus indicating indepen- dent progenitors. Concurrent subfamilies indicative of
multiple independent progenitors are exemplified by the state of the M . domesticus sequences in Figure
2.
To distinguish separate propagating elements, at least two progeny from each parent must be sampled; that is, clades must be observed in the data. For example LlMd4 and LlMdl7 share different variants (positions 6938 and 6955) than LlMdA2 and LlMdA13. This arrange- ment indicat'es that the ancestor of LlMd4/LlMd17 and LlMdA2/LlMdA13 split into two separate propa- gating elements whose copies form two separate clades. This particular example is flawed, because in- formation from the key variants is self-contradictory, as discussed above. Nonetheless, this pattern of mutually exclusive shared sequence variants is the key to identi- fylng concurrent subfamilies indicative of independent progenitors.In contrast to the appearance of the M . domesticus sequences, the M . spretus subfamilies appear to be con- secutive rather than concurrent. For example, SC3/SC4 is a more recent subfamily of the larger group SC3/ SC4/SC7/SC'2. Since all four repeats could have been descended from the same parent, SC3/SC4 is not a sepa- rate clade. ThLerefore clades must be observed to estab lish the existence of truly separate parents.
570 N. C. Casavant and S. C. Hardies
TABLE 2
Number of private base differences per element
No. of private Element mutations/length
R5 13/382 (3.4)"
s c 5 9/396 (2.3)" LlMdl 16/489 (3.2)a
SC6 right 30/408 (7.3)"
LlMdA2 0/489
LlMd13 0/489
LlMdl7 4/489 (0.8)" LlMd4 9/489 ( 1.8)
"
SPR68 1/163 (0.6)
SPR48 3/163 (1.8)
SPR49 1/115 (0.9)
sc2
4/489 (0.8)s c 7 0/467
sc3
1/467 (0.2)s c 4 1/467 (0.2)
SC6 left 0/79
Dl3SHl 0/78
Numbers in parentheses are percent.
a Private mutations were determined by comparing these se- quences with an additional 10-18 M . domesticus LINE-1 sequences.
reject the existence of small concurrent subfamilies de- pends on how big the subfamilies are. If there are a lot
of small subfamilies, then a large enough sample is
needed to get two members of each. To get a big enough
sample of 1-Myr-old LINE-1 elements, we will need a
probe to screen for that subset of the Ms496 family
which itself is only 10% of randomly selected LINE-1s
(Rim et al. 1991).
Further characterizing Ms6936 as a key variant: To determine which of our variants is about 1 Myr old re- quires estimating the time frame when each of the dif- ferent variants was spread. Since there are several vari- ants occurring since the speciation, they should divide this time into smaller intervals. We have estimated these intervals by counting differences up from the bottom of the tree and converting those branch lengths to ages by
the relationship that pseudogenes diverge at about
0.5%/Myr (LI et al. 1981). The number of unshared or
private base differences in each sequence is tabulated in Table 2. To estimate the age of variant Ms7024, we see that SC3 and SC4 average 0.2% or 0.4 Myr. To estimate the lower bound of the age of the variants Ms6936, Ms7265 and Ms7285, we add in SC7 plus the extra mu- tation of Ms7024 to each of SC3 and SC4 yielding an
average divergence of 0.3% or 0.5 Myr. The upper
bound of Ms6936, Ms7265 and Ms7285 includes branch
SC2 plus the two shared substitutions added to the lengths of SC3, SC4 and SC7, yielding 0.7% or 1.4 Myr. If we continue to add in the two to four variants up to
the speciation, we get the speciation occurring at 2.5-3.3
Myr, consistent with other estimates (BONHOMME et al.
1984). These age estimates, although highly approxi-
mate, support the conclusion that any of the set Ms6936,
Ms7265, Ms7285 and Ms7024 could be used to screen for
LINE-1 elements originating in the last 1 Myr.
Of these we have further investigated variant 6936 by
challenging it to find members of an artificially created
M. spretus LINE-1 subfamily. An interspecific congenic
mouse containing 1% M. spretus DNA and 99% M. do-
mesticus DNA (RIKKE et al. 1993) was made into an
subfamily-specific LINE-1 PCR library and screened with
a hybridization probe containing variant Ms6936. This resulted in the isolation of a LINE-1 element, D13SH1, that was then confirmed to have originated from the M.
spretus differential locus (&AVANT and HARDIES 1993).
The sequence of Dl3SHl (Figure 1) contains both the 6936 and 6920 shared sequence variants, and therefore
clusters with SC3, SC4 and SC7 (Figure 2). The other M .
spretus-specific variants are not available in Dl3SHl be- cause the element has been truncated. The ability to detect Dl3SHl confirms two things about variant
Ms6936. First, Ms6936 really is
M .
spretus-specific, thusmitigating any concern raised by its presence in SC5 and SC6. Otherwise, we would have isolated LINE-1s from the 99% of the DNA congenic with M. domesticus. Sec-
ond, Ms6936 still detects a substantial size subfamily, oth-
erwise we would not have found any members within the small M. spretus differential locus.
Finally, Ms6936 is a superior variant to use than
Ms7024 for the following reason. To recognize small subfamilies as separate clades underneath the umbrella
of Ms6936, each small subfamily will need to have o b
tained a distinguishing mutation in the sequenced re-
gion. Variant Ms6936 is old enough for that to have hap-
pened, as evidenced by the existence of variant Ms7024.
On the other hand, variant Ms7024 may not be old
enough.
As a test of the usefulness of Ms6936, we have begun to collect young M. spretus LINE-1 sequences using Ms6936 as a probe. A library of restriction fragments covering coordinates 6621-7152 was prepared without PCR amplification and probed with an oligonucleotide based on Ms6936. Eighteen positive clones were se-
quenced (data not shown). These were found to fall in
the targeted age group, and confirmed the organization
of the sequence variants within this region (6920,6936, 7024,7102 and 7146). Therefore, the PCR method used in this paper was successful in defining and estimating the age of this series of variants without serious inter- ference by any PCR artifact.
Candidates for M . domesticus-specific shared se- quence variants: Our analysis revealed 11 candidates for
new M. domesticus-specific shared sequence variants
(Figure 2). The large number of shared sequence vari- ants between LlMdA2 and LlMdA13 indicates that they have descended from a single evolving sequence long after the M. spretus/M. domesticus split, and that many of the variants should be truly M. domesticus specific. There are no private mutations to count in LlMdA2 or LlMdA13 in our 489-bp region, but we can use the full
1987) which results in 24 difference evenly distributed between the two elements over 6372 bp to estimate an age of about 0.38 Myrfor each. Therefore, LlMdA2 does seem to be a good prototype of a very young mouse
LINE-1 element.
DISCUSSION
We have further characterized a 9000 member M .
spretus-specific family of LINE-1 elements. A total of six new shared sequence variants have been defined that are specific to this family. These sequence variants arose within the actively propagating members of this family at different times spread over the last 3-5 Myr. For ex- ample, variant Ms6936 began spreading at some time after variant Ms6920, and variant Ms7024 began spread- ing at some time after Ms6936. The “Ms” prefix means that the variant clearly spread at some time after spe- ciation. By using probes for these various variants, it would be possible to screen for elements originating in a particular interval of time. The question is, what time interval should be sampled to provide some fundamen- tally new information about the biology of LINE-1, and will these variants allow us to sample that interval of time?
A key unresolved issue about the dynamics of LINE-1
propagation concerns the transition from a single newly mutated propagating element to a large family distrib- uted widely in both the genome and in the mouse popu- lation.
As
described in the introduction, it is known that large families have amplified at various points in the past, sometimes splitting in two, and sometimes dying out as new families arose. What is unclear is how we should relate the small six-member active family recently observed in humans (DOMBROSKI et al. 1991) to the larger families. Since there are up to 10,000 full length and potentially propagating LINE-1 elements in the mouse, there could be a lot of little families. One could imagine that the major families are an envelope com- posed of many small families with distinct propagating elements; or one could imagine that different propa- gating elements somehow compete with each other, and that a small family represents an element that is losing the competition.In the mouse, we believe that small subfamilies either fail or expand to larger families within about 1 Myr.
As
described in the results, we examined our M . spretus shared sequence variants to see which could best be used as a probe to screen for LINE-1 elements in the 1-Myr
age range. We identified variant Ms6936 as having the following desirable properties: (1) it is experimentally verified to be truly M. spretus-specific (&AVANT and
WIES 1993), (2) it marks a propagating progenitor
that was about 1 Myr old, (3) it detects a relatively large subfamily and (4) it is old enough that any smaller sub- families existing under the Ms6936 umbrella will have picked up individualistic sequence variants and will be
distinguishable as clades. Thus we have identified
Ms6936 as a key probe for obtaining a sample that may
tell about the dynamics relating small families to large families.
This work was supported by National Institutes of Health grant HG00190.
LITERATURE CITED
ADEY, N. B., S. A. SCHICHMAN, C. A. HUTCHISON 111 and M. H. EDGELI., 1991 Composite of A and F-type 5’ terminal sequences defines a subfamily of mouse LINE-1 elements. J. Mol. Biol. 221: 367-373. BONHOMME, F., and J.-L. GUENET, 1989 The wild house mouse and its
relatives, pp. 649-662 in Genetic Vurzants and Strains of the Laboratory Mouse, Ed. 2, edited by M. F. LYON and A. G. SEARLE. Oxford University Press, New York.
BONHOMME, F., J. CATALAN, J. BRITTON-DAMDIAN, V. M. CHAPMAN, K. MOR~AWAKI et al., 1984 Biochemical diversity and evolution in the genus Mus. Biochem. Genet. 22: 275-303.
CASAVANT, N. C., and S. C. HARDIES, 1993 Targeted cloning of a sub- family of LINE-1 elements by subfamily-specific LINE-1-PCR. Mamm. Genome 4: 193-201.
DEININGER, P. L., M. A. BATZER, C. A. HUTCHISON 111 and M. H. EDGEIL, 1992 Master genes in mammalian repetitive DNA amplification. Trends Genet. 8: 307-31 1.
DOMBROSKI, B. A,, S. L. MATHIM, E. NANTHAKUMAR, A. F. S c m ~ and H. H. KAZAZIAN, JR., 1991 Isolation of an active human transposable
element. Science 254 1805-1807.
EDGELL, M. H., S. C. HARDIES, D. D. LOEB, W. R. SHEHEE, R. W. PADGETT
et al., 1987 The L1 family in mice. Prog. Clin. Biol. Res. 251:
107-129.
FITCH, W. M., 1977 On the problem of discovering the most parsi- monious tree. Am. Nat. 111: 223-257.
GEBHARD, W., T. MEITINGER, J. HOCHTI,, H. FITCH and W. M. ZACHAU, 1982 A new family of interspersed repetitive DNA sequences in the mouse genome. J. Mol. Biol. 157: 453-471.
HARDIES, S. C., and B. A. RIKKE, 1990 A selfish reuotransposition model for rodent LINE-1. UCLA Symp. Mol. Cell. Biol. New Ser.
122: 127-134.
HARDIES, S. C., S. L. MARTIN, C. F. VOLIVA, C. A. HUTCHISON 111 and M. H. EDGELI., 1986 An analysis of replacement and synonymous changes in the rodent L1 repeat family. Mol. Biol. Evol. 3:
HUTCHISON, C. A,, 111, S. C. HARDIES, D. D. LOEB, W. R. SHEHEE and M. H. EDGELL, 1989 LINEsand relatedretroposons, pp, 593-617 in Mobile DNA, edited by D. E. BERG and M. M. HOWE. American Society for Microbiology, Washington, D.C.
JUBIER-MAURIN, V., G. CUNY, A.-M. LAURENT, L. PAQUEREAU and G. RO~ZES, 1992 A new 5’ sequence associated with mouse L1 elements is representative of a major class of L1 termini. Mol. Biol. Evol. 9:
JUW, J., 1989 Subfamily structure and evolution of the human L1 family of repetitive sequences. J. Mol. Evol. 29: 496-503. KASS, D. H., F. G . BERGER and W. D. DAWSON, 1992 The evolution of
coexisting highly divergent LINE-1 subfamilies within the rodent genus Peromyscus. J. Mol. Evol. 35: 472-485.
LI, W.-H., T. GOHOBORI and M. NEI, 1981 Pseudogenes as a paradigm of neutral evolution. Nature 2 9 2 237-239.
LOEB, D. D., R. W. PADGETT, S. C. HARDIES, W. R. SHEHEE, M. B. COMER
et al., 1986 The sequence of a large LlMd element reveals a tandemly repeated 5’ end and several features found in retro- transposons. Mol. Cell. Biol. 6: 168-182.
MARTIN, S. L., C. F. VOLIVA, S. C. HARDIES, M. H. EDGELL and C. A. HUTCHISON 111, 1985 Tempo and mode of concerted evolution in the L1 repeat family of mice. Mol. Biol. Evol. 2 127-140. PASCALE, E., C. LIU, E. VALLE, K USDIN and A. V. FURANO, 1993 The
evolution of long interspersed repeated DNA ( L l , LINE 1) as revealed by the analysis of ancient rodent L1 DNA family. J. Mol. Evol. 36: 9-20.
Rim, B. A., and S. C. HARDIES, 1991 LINE-1 repetitive DNA probes for species-specific cloning from M u s spretus and M u s domesticus
genomes. Genomics 11: 895-904. 109-125.
572 N. C. Casavant and S. C. H a r d i e s
RIKKE, B. A., L. D. GARVIN and S. C. HARDIES, 1991 Systematic iden- tification of LINE-1 repetitive DNA sequence differences having
J. Mol. Biol. 219: 635-643.
species specificity between Mus spretus and Mus domesticus.
RIKKE, B. A., L. H. PINTO, M. B. GORIN and S. C. HARDIES, 1993 Mus spretus-specific LINE-1 DNA probes applied to cloning of the mu- rine pearl locus. Genomics 1 5 291-296.
RILEY, J., R. BUTLER, D. OGILVIE, R. FINNIEAR, D. JENNER et al., 1990 A novel rapid method for the isolation of terminal sequences from yeast artificial chromosome (YAC) clone. Nucleic Acids Res. 18:
SANGER, F., S. NICKLEN and A. R. COULSON, 1977 DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 7 4
5463-5467.
SCHICHMAN, S. A,, D. M. SEVERYNSE, M. H. EDGELL and C. A. HUTCHISON
111, 1992 Strand-specific LINE-1 transcription in mouse F9 cells originates from the youngest phylogenetic subgroup of LINE-1 elements. J. Mol. Biol. 224: 559-574.
SCHICHMAN, S. A., N. B. ADEY, M. H. EDGELL and C. A. HUTCHISON 111, 1993 L1 A-Monomer tandem arrays have expanded during the course of mouse L1 evolution. Mol. Biol. Evol. 10: 552-570. SCOTT, A. F., B. J. SCHMECKPEPER, M. A B D E L R A Z I K , C. T. COMEY, B. O'HARA
et al., 1987 Origin of the human L1 Elements: proposed 2887-2890.
progenitor genes deduced from a consensus DNA sequence. Genomics 1: 113-125.
SHEHEE, W. R., S.-F. CHAO, D. D. LOEB, M. B. COMER, C. A. HUTCHISON 111 et al., 1987 Determination of a functional ancestral sequence and definition of the 5' end of A-type mouse L1 element. J. Mol. Biol. 196: 757-767.
SHEHEE, W. R., D. D. LOEB, N. B. ADEY, F. H. BURTON, N. C. CASAVANT
et al., 1989 The nucleotide sequence of the Balb/c mouse Pglobin complex. J. Mol. Biol. 2 0 5 41-62.
SKOWRONSKI, J., and M. F. SINGER, 1986 The abundant LINE-1 family of repeated DNA sequences in mammals: genes and pseudo- genes. Cold Spring Harbor Symp. Quant. Biol. 51: 457-464. VOLIVA, C. F., C. L. JAHN, M. B. COMER, C. A. HUTCHISON I11 and M. H.
EDGELL, 1983 The LlMd long interspersed repeat family in the
Acids Res. 11: 8847-8859.
mouse: almost all examples are truncated at one end. Nucleic
WAHL, G. M., M. STERN and G. R. STARK, 1979 Efficient transfer of large DNA fragments from agarose gels to diazobenzyloxymethyl-
Acad. Sci. USA 7 6 3683-3687.
paper and rapid hybridization by using dextran sulfate. Proc. Natl.