Copyright 0 1993 by the Genetics Society of America
Effects of Transposable Elements
on
the Expression of theforked Gene of
Drosophila melanogaster
Kenley K. Hoover, Andy J. Chien and Victor G. Corces
Department of Biology, The Johns Hopkins University, Baltimore, Maryland 21218 Manuscript received April 13, 1993
Accepted for publication June 23, 1993
ABSTRACT
The products of the forked gene are involved in the formation and/or maintenance of a temporary fibrillar structure within the developing bristle rudiment of Drosophila melanogaster. Mutations in the forked locus alter this structure and result in aberrant development of macrochaetae, microchaetae and trichomes. The locus has been characterized at the molecular level by walking, mutant character- ization and transcript analysis. Expression of the six forked transcripts is temporally restricted to mid- late pupal development. At this time, RNAs of 6.4, 5.6, 5.4, 2.5, 1.9 and 1.1 kilobases (kb) are detected by Northern analysis. T h e coding region of these RNAs has been found to be within a 21-
kb stretch of genomic DNA. The amino terminus of the proteins encoded by the 5.4- and 5.6-kb forked transcripts contain tandem copies of ankyrin-like repeats that may play an important role in
the function of forked-encoded products. T h e profile of forked RNA expression is altered in seven spontaneous mutations characterized during this study. Three forked mutations induced by the insertion of the gypsy retrotransposon contain a copy of this element inserted into an intron of the gene. In these mutants, the 5.6-, 5.4- and 2.5-kbforked mRNAs are truncated via recognition of the polyadenylation site in the 5’ long terminal repeat of the gypsy retrotransposon. These results help explain the role of the forked gene in fly development and further our understanding of the role of transposable elements in mutagenesis.
D
EVELOPMENT of macrochaetae and micro- chaetae in Drosophila melanogaster involves cho- reographed interactions between four sister cells. These cells are derived from two successive mitotic divisions of a sensory mother cell. At approximately 15 hr after puparium formation, the two predecessor cells generated from the first of these divisions and located within the epidermal layer divide to give rise to two sets of cells. One pair will differentiate to form the external sensory neuron and its protective sheath cell, the thecogen. The other set is composed of two precursor cells, the socket-forming tormogen cell and the bristle-secreting trichogen cell. Between 18 and30 hr of pupal development, the tormogen and tri- chogen precursor cells grow larger than surrounding epidermal cells in volume and become polyploid through a series of endomitotic divisions (POODRY
1980). Bristle secretion commences at approximately
30 hr after puparium formation and proceeds rapidly during the next 25 hr of development. Socket for- mation begins after bristle secretion has started. The socket cell has completely circumscribed the bristle by 48 hr after puparium formation.
Growth of the developing bristle occurs predomi- nantly at its tip and proceeds perpendicular to the body axis (LEES and WADDINCTON 1942; LEES and PICKEN 1945). Histological studies indicate that bristle morphogenesis depends upon the structural grouping
Genetics 135: 507-526 (October, 1993)
and orientation of microtubules and microfibrils within these cells. In particular, the formation and arrangement of fiber bundles associated with the cell surface of the developing bristle rudiment plays a critical role in bristle development. These intracyto- plasmic fiber bundles are closely applied to the cell surface and are oriented parallel to the long axis of the bristle rudiment. Cross sectional analysis of the bristle rudiments of Stubble and singed mutants shows an alteration in the amount, shape, and distribution of these fiber bundles (OVERTON 1967). In contrast, these mutants do not show abnormalities in the more centrally located microtubule structures when com- pared to wild-type bristle sections. Cross-sectional analysis of the bristle rudiments in the forked
(f)
mutant f36a 40 hr after puparium formation fails todetect these longitudinally arrayed fiber bundles (N. PETERSON, personal communication). Theforked gene (1-56.7) is located in chromosomal subdivision 15F. The product of this gene is essential during a 24-hr period just prior to bristle secretion (DUDICK, WRIGHT and BROTHERS 1974). Failure of theforked product to be made in wild-type quantities results in bristles that are shortened and aberrantly bent or forked with ends that may be sharply twisted or split. The severity of the bristle phenotype ranges from the relatively mild f
”’
andf3 alleles to the severefhd,f360 and f 14’ alleles.508 K . K. Hoover, A. J. Chien and V. G. Corces
c1 2 3 4 5 6
1
-
, & " L , m e8 9 1 0 1 1 5.4 kb(C)1.9 kb(D)
1.1 kb(E) cDNA done ??X
Dl 2 3 4 5
m
w
cDNA dorm 2W
L 6.4 kb(F)
FIGURE 1.-Structure of theforked locus in wild-type and mutant alleles. The restriction map of theforked region is shown. The identity, insertion site, and orientation of the insertions inf3,f'7k,fhd,f',f5,fK,f360 andf'4k alleles are indicated; arrows above each element show the direction of transcription, arrows in the elements represent inverted repeats, and rectangles signify LTRs of the retrotransposons. Each demarcation of the grid corresponds to 1 kb and is numbered according to the sequence of the locus in Figure 3. Transcription units of the forked locus are presented below the grid. Rectangles indicate exons of the gene; shaded rectangles represent translated coding regions and empty rectangles indicate regions that are transcribed, but not translated. cDNA clones are bracketed beneath or above the transcripts that they contain. Each box (exon) is named according to its location on the transcript.
yeast and Drosophila is the insertion of transposable elements. The mechanism of transposable element mutagenesis upon the affected locus depends on the location of the insertion site with respect to the differ- ent structural and functional domains of the mutated gene. Possible modes of mutagenesis include interfer- ence with enhancer and/or promoter activity, disrup- tion of the coding region, alteration of message proc- essing, and perturbation through more complicated mechanisms inherent to the nature of the element causing the mutation. For example, insertion of the gypsy retrotransposon in the 5' region of the yellow ( y )
or cut (ct) loci affects the rate of transcription of these genes in a tissue-specific fashion (CORCES and GEYER
199 1 ; JACK et al. 199 I), whereas insertion of the copia element in an intron of the white ( w ) gene results in premature truncation of transcripts initiating in the white promoter (ZACHAR et al. 1985; MOUNT, GREEN
and RUBIN 1988; PENG and MOUNT 1990). Character-
ization of the forked locus has been undertaken in order to analyze the mode of gypsy mutagenesis in alleles where the element has inserted into the intron of the gene. The forked locus was cloned by utilizing the gypsy retrotransposon as a tag to screen a genomic library prepared from the DNA of a f ' stock (PARK-
HURST and CORCES 1985; MCLACHLAN 1986). The f
',
f and f alleles of this gene are gypsy-induced and bring the expression of the forked gene under the control of the su(Hw) gene as well as the suppressor of forked [ s u l f ) ] , suppressor of sable [su(s)], suppressor of
white apricot [su(w")], suppressor of purple [su(pr)], sup- pressor of lozenge' [su(lz')], ~ ~ ( l r ' ~ ) , and enhancer of white eosin [en(w")] genes (RUTLEDGE et aZ. 1988; SMITH
T h e Drosophila forked Gene 509
A
-6.4 r 5 . 6 -5.4
-
2.5-1.9
-
1.1FIGURE 2.-Profile of forked mRNA expression during Drosoph- ila development. (A) Ten micrograms of poly(A+) R N A , isolated from different developmental stages of the Canton S stock were electrophoresed on a 1 % agarose-formaldehyde gel. transferred to a nitrocellulose membrane, and hybridized with a 32P-labeled 6.2- kb EcoRI-Sal1 genomic fragment (see Figure 1). Embryos were collected every 24 hr and reared at 22.5". The stage of the flies upon harvest is designated at the top of each lane. The sizes of
forked transcripts are shown in the right margin in kilobases. (B)
The same filter shown in A was hybridized with the Drosophila ras2 gene as a sample loading control. This R N A is expressed at constant levels throughout development (MOZER et al. 1985).
scription initiation level, and su(s), su(w") and su(f) are presumed to affect mRNA splicing and/or stability (SMITH and CORCFS 199 l ) , suggesting that gypsy might affect these various processes when inserted into the forked locus. T h e work reported here describes the
molecular characterization of the forked gene and the effect of gypsy insertion on forked transcription, offer- ing new insights into the mechanisms by which trans- posable elements affect the expression of adjacent genes.
MATERIALS AND METHODS
Fly stocks: Flies were reared at 22.5" and 65% relative humidity on standard molasses/yeast/cornmeal medium. Embryo samples for Northern analysis were collected at 24-
hr intervals and raised at 22.5" until reaching the appropri- ate developmental stage for harvest. The f ', f
',
f 5 and f 36a stocks were obtained from stock centers. T h e f m' mutationwas obtained from T. GERASIMOVA (Institute of Gene Biol-
ogy, Russian Academy of Sciences, Moscow, Russia). T h e
f hd allele was obtained from D. DORER and A. CHRISTENSEN
(Thomas Jefferson University, Philadelphia). T h e f 14' and f
'"
alleles were obtained from A. KIM (Moscow State Uni- versity, Russia). T h e f tuh-l tuh-3 stock was provided by D. KUHN (University of Central Florida, Orlando).RNA preparation: Total RNA was prepared using the sodium dodecyl sulfate (SDS)-phenol technique (SPRADLING and MAHOWALD 1979). Poly(A)+ mRNA was then purified
by oligo-(dT) cellulose chromatography (AVIV and LEDER 1972), fractionated on a 1.0% formaldehyde/agarose gel, and transferred to a nitrocellulose membrane. Filters were hybridized overnight at 42 O in a solution of 5X SSCP ( 1 50
mM NaCI, 15 mM sodium citrate, 20 mM sodium phosphate pH 6.8), 5X Denhardt's solution, 50% formamide, 1 % Sar- cosyl, 100 Kg/ml carrier D N A and 10% dextran sulfate. Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed cDNA library was prepared using the cDNA Synthesis System Plus (Amersham, UK) and X gtlO arms. Random hexamer cDNA clones were generated with the Zap-cDNA synthesis kit (Stratagene, La Jolla).
DNA manipulations: DNA was isolated from Drosophila by homogenizing 1 g of adults in a buffer containing 0.1 M NaCI, 0.2 M sucrose, 10 mM EDTA, 0.5% Triton X and 30 mM Tris-HCI (pH 8.0). The sample was then filtered to remove debris, precipitated, and resuspended in a lysis buffer containing 0.35 M NaCI, 10 mM E D T A , 1 % N-lauroyl sarcosine, and 10 mM Tris-HCI (pH 8.0). D N A was extracted with phenol/chloroform and precipitated in ethanol. Treat- ment of D N A with restriction enzymes, Southern analysis, cloning of D N A into plasmid and phage vectors, library screening, and labeling of D N A by random priming were carried out by standard procedures (SAMBROOK, FRISCH and MANIATIS 1989). Genomic phage libraries were generated from partial Sau3Al digestions cloned into the lambda Dash vector (Stratagene, La Jolla). Sequencing of the forked locus from Canton S (wild-type), mutant alleles, and cDNA isolates utilized the dideoxy chain termination method (SANGER, NICKLEN and COULSON 1977) using Sequenase (United States Biochemical Corp.).
Electron microscopy: Young adults were fixed in a 4 %
glutaraldehyde solution of 0.1 M sodium cacodylate (pH 7.0). The samples were then dried, sputter coated with gold- palladium, and viewed in a Jeol JSM35 scanning electron microscope at a magnification of 95x.
RESULTS
Analysis of transcripts encoded by the forked
gene: A chromosomal walk of contiguous genomic clones was performed in both directions flanking the insertion of the gypsy retrotransposon in the
f'
allele (PARKHURST and CORCES 1985) in order to derive a DNA map of the forked region. This walk spans a distance of 50 kb (Figure 1). T h e 6.2-kb EcoRI-Sal1genomic fragment within which the gypsy is inserted was used as a probe to hybridize to a Northern blot of poly(A+) RNA prepared from various stages during Drosophila development. T h e results (Figure 2) show that the forked locus is temporally expressed during mid-late pupal development, during which time six transcripts are detected. T h e amount of these mRNAs was quantitated with a Molecular Dynamics PhosphorImager. A prominent 2.5-kb transcript ac- counts for 42% of the total forked RNA. Two RNAs of 1.9 and 1.1 kb make up 17 and 18% of the total RNA, respectively. Transcripts of 5.6 and 5.4 kb in length each constitute 10% of the total forked RNA. A rare 6.4-kb RNA constitutes 3% of the total forked RNA.
510 K. K. Hoover, A. J. Chien and V. G. Corces
1 CGAGTGCTGG TTTATCTCGT CAAATACCGT TGCTAGTTTG AGGTCTTTGT GCAGGGTTGT GTTCCTTACG AACCATTCTG CTCTACATM TTTGGCTMT AAGTCATTM GCTACTTTCT TACAAATTGT TMTTATTCT ATTCCTTATT CCTTTCCATC CTGCAGAACC ATAAACGATT ATGCGCTCU TTCG~SITCTG C A C c t a a g c c a c a a c a c c a a a c t a a a c a a t c t g c a c a g c a a c a a g c t c c t c c a c a g c a a t c a a a g c c a c a g c a g c a g c a t c a a t c c g a g c aCaaggatgC
c1
a C t C C C a C C a g c t g g a t t c c t a c f t g g g c g g a g t g g g c a g t a t g a t g a a t c t g g g c g g c a t g g c c a a g t t g g c c a g A T G T ACCATCCGGG GAGCCTTTCC MET1 V r H i s P r o C L y S e r V a l S e r 401 GGCAGCWM GACCCMCCG AGGAGCTCTG GCCCTTCACT ATCCCGCCGC ACGTCGTTGC CTTGATTGTG TCCAACTGCT GGTCGCCGCA TCCGTGGATA
G L V S e r G l u A r g P r o A s n G L v C l V a L a L e U A l a L e u H i s T v r A L a A L a A L a A r g G l v C v s L e u A s K v s V a l G L n L e u L e u V a l A L a A L a S e r V a L A s D l
TTTGGTMGT AGATGAGTTA CTGTCCCAAA GGAGTATAGT ATATGATGAC CAAMCTTGC ATCGCGATTT GAGTGCGGGG TGAAAGCTCT TGCACCAGCC K1, H1
CTTCATTGTT TTTATGTGTC GGGGAAATCG CTCGACGATC GATGGCTCCA CTTTATATTT TATTTTTCCC CCATCTGTCT GCATTTCCAT CGATGCGGTT TCTTTCCATT TACTTGCGAC AATTATCTCA TTAGGGGTTT CTTATCTGAT TCGATTAGTA TAAGAGCTGA u C A A G T W \ A TTATTGGCTG T G ~ T C T A 801 ACTAMTTTG GACTTCGTTA TAMATCCAT TTCGMACGT ATGGTGGTM TTATAGCAAT G C T T T W CAAGATTAAG A G A G ~ C GACCMTGTA
TGTATATATA T T T T G M M A ATTAAAGAAC TGGTAATCAA TATTTTAGAA TACTCTTTTC TTMGCGAAA ATGATATTTG GCAGGTTTAT ATAGATTTCC AAGTATCMT TCCGTACAAA CGTATTATTA AAAGTATTAT ATTGTTCGAT TTATGTACAC CATTTTGCCT AGTTGTAATA ATCATTCAAA TGGAAGCATT TMGTGTATT MGCCTTCGT AACGATCTGA TTATCATAAA CTCAATCAAC GATGATGTCG ATTTGAGAAT TTATCTACCT GGTMGCTGG GAGTACTCAT 1201 TAAATGCACT M T T A M A C T TATTAGCCCA TCCMCAAAG TAGAAAGMT GAAGGCGAGT GGTGAAAAAC TAGGAAAMG GCMGGAAM CCTGGCGMT
A T C T T G W ATCTATCCGC TAAGTGGGTT ACCGATCCGT GGCCAGTGTT TGATTCACCT AAATTCCTGT GCTCCAACAG CAGCACGCGT CCAAAGCTO. TTCATTTAAT TGGGAAATGG GAATCGGAAT CCAACTGGCC AAGGCGAGCC AGCCAAGCGA GATGCACTCC ATCTTCAGCC CTCCTTGGCC CCCTGAATCG
CCTACGATCC CATGCCATCC CATCGAGTCC AGGCCAGGTT TGGTTCTATC GTGTGCTATT TTCAATTGGA GTCAAACTGG MAACTCACA G C G G C A C ~
1601 GCACCGTTGC M C M G T T T T CCTCACTCGC CACCAAGTGG AGCMCTCCG GACTGACTGG CCAAGGAATC AGAATTGGAC TCAGMTCAC AGTCGGMTC C G A G T C A W TCAGTGCCAC TGCCTGATCC CCACCTCCCT CGATCCTCAC CAGCTATATT CATATTTACA MGCGCCAAG GCAGATCMT TGACATGACA ACTGACATGG CGACACAGM AAGAGCGAGA G A T G A W C G AGGGCGGCTC TGTGTATAAA ATGTATTGTA CMTAACTCG TCCCGTTTTA A C G T T M T M
2001 ATTCTATTTT MTTACATTA CATCGCCATT CATTTGGCTT AACTGTGGCC AAAAGTTATT TGCCTTACAA T C T T T T C M C CGACCCTCGT CACMCCGTT TAGGGCTTTT TAAAGTATTT TTATTTAATT ATTTAGGCTA AGGAACMAG GGTATTAAGT ACTGACAATT TTATCACGAG GTATTATTCC TATMTCACC
TWTAGTTTT GCGTACAGAG ACACAAACCG TTTGATAGTT TTGCGTACAG AGACACAACG ATCCAAACAG CTGGTGCTTC TTATATTAAT ACTGCAAAGA AGAAGGCGAG GGCAGTTTCA TGACCTTCTT ATAAAGATAA AATAAAATTT TAAAATTTAA ATTTCTGTTA AAATTTAAAT GAAATTAGTT TMTGTCAAG AAGCAAAMA ATTTGAACAT TTTAGAGMT TGTTTACGTA TGATTTATTA TCTCATGATC TTGCTCCGAT GCTATTGTAA ATAATATTGA AAAAAAAACT 2401 TTCCCTAAAT AAGAAATTTA ATTGGAAAGC ACACTAATTT TAAACGCTTC AGAMTTGAT GATTTCAAAT TTTTTTTCAA ATTATTTAGC GTGTCATAAT
TAAATGGCAC CTATGTACTT CCGGGTGACT TGTGTTGAAC TGTCAATCCG CCCTCTCCAC TGGCCATTTG ATAATAATGC GTATGGCGAC AGCTCGGTCT CGAGTTACCT GATTTGCCCC CGTCATGCCC CCTGCGTCAC CAAAAGAAGA CCAGGCCGAA GAAAAAGAAG AAGTGGAGGG AGCMCACCA CCAGCACCAC
2801 TATATCGTAT CGGTCTCGTG CAGCATAATA CMCACATAT ATTTCMACA AATATATAGA GTGATAGTTG GTGTTAAAGT TGTTACTGTT TACCAGCTAC CAGTGGTTCG CGGTGGCCAC TCGATGGCGC TTGCCGTTCG GTGGTGCTGG CTATTCGCGT TTGGCTAGTC GAGGCGAACG GTGCTTTCGT TCGTGGGTCT
CGTTATGCAG CAGCMCATG CGGCCACGGA ATGAATTTGA ATTTTGATTT CAATGGTGCA TTTAGTGCGG TTTCGTTGTG AAGCTCGATG CATCAGCGCG GCCACGCCCC CGCGACTGTC CGTTGGTCAG CCAGCAGAAG CAGCATCAGC AGCAGCATCA GCAGCAGCAC TAGCAGCAGC AGCAGCAGCA GCAGCAGCAG CMCAGCAGA C M M T T T C G C G C T T C M A G TATCTCAAAT GCAGTTAACG MTCGGAACG CGMTTAATT AACAGTTTCA GTGTACCAAA GAACTGCAAG 3201 GCACAAAGGC AGCTAAAGCA AAAGCAACAT CACCATCACC ATCACCGTCT GCACCCCGCA AACAGTAGCC ACAACATTTG GCTTAATGCC CAAGCCAGCA GCAACATCCG CAGCAGCAGC AGCCGCAGCA ACCGCAACCA TGAAAAGGTA GCAACGGCTG CTTTTGGCCA ACATTCAGTT GCCACTAATC GATTGGCCM GGCAGCGAGT AAATCGAATT GGCCACCGGG ACGCCAAGGA TGCCGAGCTA ATCCAATACG CTTTTTTTTT GCCAGTGTGT TTGAGTGCGG AATTAAATGC TCTAAAATAT ATGTGCGTTC AGTTGTGAGT GCTTGTGCTT GGTCTCAGAA AAAACAGTTT AATGATCATT TAATACACCA ATAAATATAT TCAGAAATGC 3601 AGCACATCAG TAAATGGAAA TAGCAGTAAT AATTATAATT TATTCAAAAT AGAAAGCCTA GTTGAGCAAA TAACTTACTT TAAACAMTT GGATGTAGAA TTGAGCTGGA CTGTGATTAA TCATCTTGAC TCGTCTAATA ATGTTTATAA GGAACAAAAT TAACATATTC TTACTGAGAA AATCTTCTAT A C T T C W AGAAGGAGTC ACCMTCAAA ACTCAAATTT GTATGGTATA GAAGATTAAC ACATATTTAT ATGTATGTAG GTGGTAGTTG TTTGCTATGA GCCAGTATGC
f3(412)
ATACCCTTTC E G A T G A C TTGAATCTTA AAATAACCCA GATAAGACTT GATAATTGTT AGAGAATTCC ACATCTTAAC GTATCCACTA ATCGCCGCGT 4001 AATACTAACA TCGGCTATTG TATCTGTATC TTTCTATCTG TTTATCTCCC TGGAGTCTTT TTGTGTCACT CGCATGTCTT TCATTCTGCA TTCTCTCAGC
GATGGGCMT CGACGAGTCT GGATTGTGGA TCGCCAATTG TTGATCGTCC ATCATTGGAT TCGAGCACGC MATGTTCAA TTAAATAAAC GAGTCTATM CTGTCAACTG GCATTTCTM CGAGGCTCCT GCTTTGCCTC AATTATTTCG CTTTTTAAAA CCACTTGCTG CCTCGGCCCC CAGACATATT AAATAAATTA AAAAATAATT ATTAAATTGC AAATACTGAG AAACAAGACA AGCGGGCCAA ACACAAAAAA AAAAAAAAAA CAGCAAAGAG GAAAAAATTG GCAATGTATA 4401 TAGAAATATG TTTCGTACTG TATTTTCACC TACTAAAGAA GCAACAACAT ACGAAAATAA ATGGCGGGAT GGAGTGGATG AAGATGAAAA ATCAACTTTA ATTGGTTTAC AATTCATTTC TAAGCGAGAA AAGCTGGAAA ATGTACCAAG TAGTCCCAAG TCGGACGAAG AAAGCTAACT CCGCCGCCGT GCGCAATATC CATGAAGTCG TAAGCCAGTT GACTAAGTGA CAGATACTAG TATCCGTACC CATATAACCA TCATCAACAA CATGCCTCCC ACGCCTCCGT TCGGCCCCCG AAACTTCTGG AGCTCAGGGG CCAAAGCACA GTGGGACCAT GGTTTATTTA GGGTCGAATC TAAAGTAAAA TGGTTTAGGC AATATTTGGG ATAAGTAGGA 4801 GGGAATTGTT CAATACAAGC TAGTTAACTA GTACAAAAAT AACTATAAAT CAGATCATTA AAGTAAATTC TCCAAAATTA TACTGTTGGA TTAATGGATT
CATAGATATG C T T M T A T T A AAATTGATGA AAAATTATGA AAAAGGTTTC AAAAGTMTA ATCAAAATAA GTATGTATTT TTGTCTCAGA ATTCGTTTAT TGAGTTCTGG TGCCTGATTT ATTGCTTGCG AATTGTGTTA ACAATTAAAT GGTCGCGAAG TTGGTAAACT ATTTTCCTAA CTCTGCTCAG CTGCCTGTTA GGCAATTGAG AGTTGGAAAG GCCTGACCAA ATTCATGGCG TTTTGCTGGT CTATATTTTG GCGCTAGCAA ATTGGCGTGA TCACGGTGAT CAGTGGCCAT
FIGURE 3.-DNA sequence analy- sis of theforked locus. The sequence of the coding region offorked is pre- sented. Bases are numbered accord- ing to their location on the grid of Figure 1. Lowercase letters indicate transcribed, untranslated RNA. Ex- ons have the amino acid translation listed below. Underlined residues are labeled as follows: A, B, C and D identify each exon and transcript (see Figure 1); K, ankyrin-like repeat; G , glycine-rich region; H, hydrophobic stretch; 0, C A X or opa repeat; P,
area with high proline concentra- tions; and PRD, paired repeat. Trans- posable element identity and inser- tion site for the characterized alleles is marked by double underline. In- sertion of these elements occurs be- tween the underlined nucleotides.
ern and cDNA analysis: An oligo-(dT)-primed cDNA library was generated from poly(A+) mRNA of mid- late Canton S pupae. Several cDNA clones were iso- lated using the 6.2-kb EcoRISaZI genomic forked frag- ment as a probe. Genomic and cDNA isolates were sequenced and the organization of exons and introns was determined by comparing the sequence of the cDNA isolates with that of the corresponding genomic DNA. The splice site donor and acceptor sequences conformed with consensus splice sites (MOUNT 1982). As depicted in Figure 3, all clones that were obtained from this screen recognized the first of two close polyadenylation signals located at base 18,423 of the genomic sequence (PROUDFOOT and BROWNLEE
1976). Sites of polyadenylation were observed to be- gin 16-25 bases downstream of this polyadenylation signal. Analysis of these cDNA clones allowed the determination of the structure of forked encoded tran-
scripts.
The 2.5-kb transcript: One of the clones isolated during the screen, named 22G, defines cDNA A (Fig-
The Drosophilaforked Gene 51 1
5201 TATAATTCTC CGTTGGATGG GGCACTCCAC TCTTGCCCAC TGTGGCATGA GGAAACCAGG CCAAGCCAAC TAAACGGTAA CACCAGCAAC TCCAGCTAGA TCTACACGAG TGGCAGACGC CAAGACMGT GCAGACTTTA GACTGGGAGT CAGGCGTAAC TACTCGTGGA WGAACCTCT CCCCTTGACG GTTTCTATGT TTGTTTCGCT TTTAGGCGTT AATTCACCTA AAAATGTATA TAGATGTAGA TGTGAGGCAA GAAGCATAAG TTTGTCCAGC AGATCGAGCA TCGAGGGTTT MTCACAGCC CAGACITCGC AGCGAGATGT CGGAATAATG CCGACTTTCG GTTTCAGCAG GTGTATACAT ATGTGATCCA TAAAGAAATT GTACAAAtaa E 1 5601 agtgaaaaac a a a a g a c t g c agccgaaaac t t g c t a t g c a a a t c a a a g c a t t t a a g t g a t t t t t c c t a a a c c a g t t c c a t aaatcgaact c a t t t t t t t t
cccgtgcctg tgtgtgtgca t a t a a g t g t g t t t g t g t a a c g g t a c a a g g a a a a t c a a a g c t a c a c g a a a a c c g g a a a t a t a t a a a g t c t a gtgacaaacg aaagtaaaat g t g a a a t t t c gaacgtgaac ggtaagccaa g g c g a a t a a a t t a a a g t t c a c c a a a t a a g t t t a a t t a a a a caccagcgca ccaagagcac tgccaattcg aacccattca aatataaatc aatatataca t a t a t c t g t t aaaagcaaag g a t c c c a c t c a a a c a c a c a c cccagtcacc tgaccaccaa 6001 atccagtcca c t t c c t c a t c g t c a c c a t c c t c c t c c t c a t t c g c t a g c t a c t t c t c c a c g a t c a a c g t t a agaagggcct c c t a g g c c g c a t g a c c a c a a
g t c t g a c c t c ctaccgccgc agcaagcaga a g t c c a a g t c caacaatgac a t c c a c a c c a t c a t c g c c a c cgaggtggtg a c g g c g g c c a ttctcagcag cagtgtccac agcagcagca c a g g a t c g g a aatcggggat ggcggacagg g c a a c a c c a g c c t c a t c g a g gaggcagctg c c c a g g a t c a gagcagcgag gagcaggatc gccagcatca g a c g c c a a g g a t c c a c t a g c c c a a a c g g a g ATGAACCATC ACCCGCACTC GCACCACCAG CGGAAGTCCG CCGTGGAGGT
W E T A s n H i s H i s P r o H i s S e r H i s H i s G l n A r g L y s S e r A I a V a l G I u V a 6401 GGTGCGCCAG CAGATGGACG ACGACGGGGA TCAGGACTCC AACGATGACG ACGAGACCAT CCCCATGATG AGCCGCGTCG GGCGCAAGGA GCAAAACAAT
I V a l A r g G I n G l m E T A s p A s p A s p G I y A s p C l n 4 s p S e r A s n A S p A S p A s p C l u T h r W E TProHETWET S e r G l y V a l G I y A r g L y s C I u G l m 4 s n A s n CGACGCAGCG TCGTTAGGTG GGATATCACT AAAAGATCAA TGATATCTTG CGTGATCACA AAACTTTTTA CCTACATATA CTACGCAATA AGTAATACAG A r g A r g S e r V a l V a l S e
GTCMGTAGT ATAGCAGTAT ATGTCTGGGG CTAAATAATT ATAATTAATT CCATTCTCGT TGCTGCAAAT GACTGCCTTT TTGTCTTGGT TTAAAAAAGA AAATGGTTAA GATCAAAAAG AATTTATAAG CCGTTAAGAA AGTTTTCAGT TACATCGTGC GTACTAGTGC GCTTGCACTT CTTCGAATTA CATTTTCATT
6801 GATGCTTAAA ATGAATCATG AACTGAAACC GTGTACTCAT ATAAATATGC CTTCGATCAT TGGTTTGAAT GACTAATTAT ATTATAAAAT ATTTACATAA AATATTGTGT CTCACAAGGC GATTAATTGG TTTAATGCGA TTAAAATTTT AAACTTATCA TTTAATAGAG GCCTATAGAT TTGCACATAA CTGGAAACTT GTTACCCAAT TCCTATCMT GGCAGACCAC ATTTGATGGG ATTATAGTCG GCTGAACACC GTGACAAATC GGTGCATAAC TTTTCCCACA CATTCTCCAA AGTTACCATT TCCATTGGCG GTGGAGTTGT TAATCTCTGC CTACCGTCCT CCATCTCTTG TMTCGATCG ATCAATGATA TTCCGCAATA AAAACGAGTG 7201 AAAACCGGTC AAAGAAGAGG AAAATGCTGA GTAGAGATTT TCCAGTGACC CAGTGCGTTT CTATCGATTA ACTGCAGTTA TATGTTATAG ACACTCTATA
TGCCATATCT GCGTAGGAAT AGGTGCCTTC GAAATGAGAG CACTTTTAGC TATTTCGGTG CAACTTTTTG TTCATTAAGC TCCCCTTTGT ACGGATACTA
TACCACCTAA TTGGATGACA AGTAATGAGC CTTTTTTCCG CCGCATTGCA GCGCCAATAC GCAAATGGAC AACGATGTCA CGCCCGTTTA CCTGGCGGCG s A l a A s n T h r G l r M E T A s p A s n A s D V a l T h r P r o V a l T v r L e u A l a A l a
G l n G l u G l Y H i s L e u G l u V a I L e u L v s P h e L e u V a l L e u G l u 4 l a G l y G l v S e r L e u T v r V a l A r s A l a A r q A s f f i l v M E T A l a P r o l l e H i s A l a A l a S K 3
7601 CACAAATGGG CTGCTTGGAT TGCCTCAAAT GGATGGTGAG TAGATGGTGC ACCAAATGCT TCGAATTCGG TCTTTTCATC CGATCTGTAA GCAGCTAACT e r G l r U E T G l Y C y s L e u A s p C v s L e u L v s T r p n E T
GCCAACTTCC TGGCCATCTT GGCAACTGCA GCAGTTGAAC CAATGCAGG TCMTTGCTA CGTTAAGTGC ATTGGCCAAA ACTGAATGTT GCATTAGGAT TGAACAGAAA CCGCAAAACA GAAAGGGTTT TTGCGGCTTA TAAATATTTG ATATTATTAT AATATGATCC GATTTTTAGG GCGGAGTMT AGATGATTAG T U T C T A A A A TGAGCAAATT GCTGCATTAA TATCAATTTA CAATGCATAG CATTGAAAAT CCATCAAGCT TGTACATTCT AAGAGTCTAC TTTCATTGAT
8001 GCAGAATACT ATTCTATTAC TCGCAAAAGG GACATGGATT GTTCTTAACC CATTCATTTT TCTCAGTCGG CTGTGAACCC GTTTCTTTTT GCTAATGTCT GTGGGTCCTG TTTTTCAATA GTTTTAACGG GGTTCTCGAG TGGGATTCCC GGCCAGTTTG ACGCCACGCC AGCGCCATTT ATCAATCGGG GAATGCGCTC GATGGAGGAG GGAGCAACTT ACCAAGTAGC CAACTAGCGA AGTGGACAAG TGACCGAGGA GCGGAGAGAG TGGCTTATGC AAAGCCACAT CTTCCAGTGC TGACAATGGC CTCCCAATAA CCAGAATGCA ATTTCAAGAT CATCGCCAGC GGCAGTGGM AACCGCCTTT CTAAATTGAA CTAAATAGTT TCCATTATAG 8401 CCGCAAGCCT GCCGGCTGGC AAACCACTTT CCACATACGC AGTCTGTGAG ATACTTAACC GCCGAAAGAC ATTGAGTTGA TTTAGTCGAC AAGGAGTCGG
CATTTCGATG GACAGCGCGG AATGATACCT CAACATCTCG TCAAGACATC GTGCTTTTAG ATGGATAAGC TGTAATAAAC ATGGTCAAAA TGACAGTTGT GCTTCGMCT AAAGAGACGT TGATGGATCG TTCGGTTTCG G G T G G T T T M M A C A T A T T C ATTAGTCACT TATAACAGAT AGTAAATCAT TTTATTGCCC ACATATTATA TTTGTTTTTC CCTCTTATCC CATCAAAGGT CCAGGATCAA GGTGTGGATC CCAATTTAAG GGATGGCGAT GGAGCTACGC CCTTGCACTT
V a L G l n A s D G l n G l W a l A s d r o A s n L e u A r g A s D G L Y A s D G l y A l a T h r P r o L e u H i s P h
E3, C 3 K3 K4
8801 TGCCGCCTCC CGTGGACATC TATCCGTAGT CCGTTGGCGG CAATCACGGA GCAAATTGTC CCTGGATAAA TATGGCAAAT CGCCCATTAA CGATGCGGCA e A l a A l a S e r A r g G l y H i s L e u S e r V a l V a I A r g T r M r q G l n S e r A r g S e r L y s L e u S e r L e u 4 s D L v s T Y r G l Y L Y s S e r P r o l l e A s n A s W l a A l a
K5
GAAAATCAGC AGGTTGAGGT CAGTGTTGAA CTAGAAAAAA TTTCAATTAA ACATACATAT TAACATATTA ACTATTTACC CCTACAATTT TTTCTTGTTT G l u 4 s n G l n G I n V a l G l u
AGTGCCTGAA TGTCTTGGTT CAGCACGGCA CTTCGGTGGA TTACAATGGC AAAAGTAGCT CACAGCGGCA CAACTCGCAG CAGCAACTGC ACCAGCAGCA C v s L e u 4 s n V a l L e u V a l G l n H i s G l y T h r S e r V a l A s p T y r A s n G l y L y s S e r S e r S e r G L n A r g H i s L y s S e r G l n G l n G l n L e u H i s G l n G l n H i
84, C 4 K5 01
CCAGCAACAG CAGCAGCMC AACAGCAGCA GCACCTGAGC AGCAGCTGCA ACTCGAACTC GAATTCCTCG AAGCAGACGA GTCGCAGCAA TACGATCAAG s G l n G l n G l n G l n G l n G l n G I n G l n C l n G l n H i s L e u S e r S e r S e r C y s A s n S e r A s n S e r A s n S e r S e r L y s G t n T h r S e r A r g S e r A s n T h r l l e L y s
Exons A3, A4, A5 and A6 hybridize to all forked transcripts with the exception of the 1.1-kb RNA, which only hybridizes to A3 and A6 (see below). These results suggest that cDNA A, defined by clone 22G, corresponds to the 2.5-kb transcript.
Sequence analysis of clone 22G indicates that the 5’ end has a guanosine nucleotide that is not present in the genomic sequence. This difference is presum- ably the result of the 7-methyl guanosine cap of the mRNA (BROWN et al. 1989). The transcription start site of this transcript is AGGCTGC, located at base 12,982 in the genomic sequence (Figure 3) and has weak homology to the ATCAG/TTC/T sequence that is conserved in the transcription initiation site of
other Drosophila genes (HULTMARK, KLEMENZ and
GEHRINC 1986). A sequence resembling a T A T A box is flanked by a G-C rich sequence and is located 32 bp upstream of this transcription site. There is a possible CAAT box located 61 bp upstream of the transcrip- tion initiation site (Figure 3). The low extent of ho- mology of the transcription start site, T A T A box, and CAAT box with that of the consensus for these pro- moter signals in other eukaryotes may explain the relatively low abundance of this transcript or may be due to a pupal or tissue specific promoter sequence. Similar low homologies have been described for the pupal specific transcript of the singed locus (PATERSON
512 K. K. Hoover, A. J. Chien and V. G. Corces
9201 TCCAAGTCCT CCTCGACTTT GAGCTCCGAC GTGGAGCCAT TTTACTTGCA TCCACCGGCC TTAGCAGGCG GATCAGGAGG ATCGGGMTG GGCTTGGGCA S e r L y s S e r S e r S e r T h r L e U S e r S e r A s p V a l G l u P r o P h e T y r L e u H i s P r o P r o A l a L e u A L a G l y G 1 y S e r G l y G l y S e r G l v n E T G l v L e u G l y l l TGGGMTGGG CATGGGCATG GGCATGAGCA TGGGCCTGGG MTCTCAACG AAGAACAGCG ATGCCTTGTA CTCGCAACAA TCGAGGAGTT C C T C C G A W E T G l V n E T G l v W E T G l y W E T G l V n E T S e r l l E T G l v L e u G l y I l e S e r A r g L y s A s n S e r A s p A l a L e u T y r S e r G L n G l n S e r A r g S e r S e r S e r G l u L y
HZ, GI
GTTGTACAAC GGTTCGTCCT CCTTGGGTGG CAACAGTTCT TCGGGAGGCG GAGGCGTCGG AGGAGGAGGT GGAAGCGGTG GAGCCCTGCT GCCCAACGAT s L e u T y r A s n G l r j e r s e r s e r L e u G l y C I y A s n S e r S e r S e r G l y G l v G 1yGlWalGL Y G l Y G l v G l v G l y S e r G L v G L v A L a L e u L e u P r o A s r A s e
H3, G2 GGGTTGTATG TGAATCCCAT GCGGAATGGC GGCATGTACA ACACCCCATC GCCGAATGGC AGCATCTCCG GGGAGTCGTT CTTTCTGCAC GATCCGCAGG G l v L e u T v r V a l A s n P r d 4 E T A r g A s n G l y G l y N E T T y r A s n T h r P r o S e r P r o A s n G l y S e r l l e S e r G L y C l u S e r P h e P h e L e u H i s A s p r o G l n A 9601 ACTATCATCT ACAATCGGGT GCGGGATCTG TTCGTGGACT CCGATTGCAG CAGCAGTGTG AAACAGCATG GCAAGAGTCA CGCCCACCAG CTGCTACAGT
s p T y r W i s L e u G l n S e r G l y A l a G l y S e r V a l A r g G l y L e u A r g L e u G l n G L n G L n C y s G I u T h r A L a T r p C l n G l u s e r A r g P r o P r o A I a A l a T h r V a CGAAAGCGGG CAACGCTATG ACCATCAGGC GGACGTCACT CCAGCTCCAG CGGACGGCAG CGGCTCCGAC GAGTCCGTCT CCATCTCGTC AGCTTCAACT L G l u S e r C L y G l r A r g T y r A s p H i s c h A 1 a A s p V a l T h r P r o A L a P r o A L a A s p C l y S e r G l y S e r A s p G l u S e r V a l S e r l l e S e r S e r A l a S e r T h r CGCCGAAGCA GCAGCAGCAG CTGCGCAGCC AGAGTTTCAA TCGCCACGGT AGCAGCAGCA ACATCAGCGC CAGGCAACAT C A A G M C M C A A C T A C M G G A r g A r g S e r S e r S e r S e r S e r C v s A l a A L a A r s V a l S e r l L e A L a T h r V a l A l a A l a A l a T h r S e r A L a P r o G l y A s n l L e L y s A s n A s n A s n T y r C y s V
TGACAAGGGC MGCCCAGCA ATCAGAATGG CGGCGGGGAT CACGACTACT GAGGACATAT ATCTGGTTCG CGAGGAGTCG CGCAAGCAGC ACCAGCAGCA ~ 4 . ~ 2
10001 ACAGCAACAC CAGTTGCAGC AACAACAGCA GCTGCAGCAC CAGGCCACCC AGCAGCAGCA GCAGCAGCAA CAGCAACAGC AGCAGCTGCA ACACAAATAC a l T h r A r g A 1 a S e r P r o A l a I l e A r g H E T A L a A l a C l y I l e T h r T h r T h r G l u A s p l l e T y r L e u V a l A r g G l u G l u S e r A r g L y s G l n H i s G L n G L n G L
n G l n G l n H i s G L n l e u G l n G l n G L n G l n G 1 n L e u G l n H i s G l n A l a T h r G L n G I n G l n G I n G l n G l n G l n G l n G l n G L n G I n G l n L e u G l n H i s L y s T y r 02
AACGTAGGAC GATCCCGGTC GCGGGACAGT GGCTCCCATT CGCGATCGGC CAGCGCCAGT TCCACCAGGA GCACGGACAT TGTGCTGCAG TACAGCAACC A s n V a L G l y A r g S e r A r g S e r A r g A s p s e r G l y S e r H i s S e r A r g S e r A 1 a S e r A l a S e r S e r T h r A r g S e r T h r A s p l 1 e V a l L e u G L n T y r S e r A s n H ACGTAAGGTA CATTTGTTAT TGCAACAGAA AGGTCTAAAA GTATGCAAGA AAATATTTTA ATTTAATGCA ACAAACAACA TATTTTTAAA CCTGATATGC
ATTTGCTACT TAAAGAAACC TTTCAATGTG CTAATTAACT AAATTATGTT TTAGACTTAT ATTATAACAA TTCCTTTTGA TCCTTCAGCA TTTGAACAAC i s
H i s L e u A s n A s n B5,C5
10401 M G C G C M C A TGAACAACAA CAGCAACATG AACAACAGCA GCAGCAACAT CGCACAGAGC AGCAGCAGCA GCAACMCAA CAACAATAGT CTGCTGAATC 3."Continued' L y s A r g A s W E T A s n A s n A s n S e r A s n H E T A s n A S n S e r S e r S e r A s n l 1 e A l a G l n S e r S e r S e r S e r S e r A s n A s n A s n 4 s n A s n S e r L e u L e u A s n A
GCAATAAATC ACATTCGATT ATTGGACTGC ACAGCAGCAA ATACGAGTCC TGCCTGAAGG ACAACTACAG CGCAAAGAAC GTCAATCTGA AGAACCAACT r g A s n L y s S e r H i s S e r l l e I l e G l y L e u H i s S e r S e r L y s T y r G l u S e r C y s L e u L y s A s p A s n T y r S e r A L a L y s A s n V a l A s n L e u L y s A s n G L n L e
f17k (hobo)
CCTCAAEGC GGCATCAAGA GTGATACCTA CGAAAGTGTC TGTCCGCCGG AGGATGTGGC CGAGCGGACC AAGCAGACGC ACAAGAACTC CATGATCCGC u L e u A s n G L y G l y I l e L y s S e r A s p T h r T y r G l u S e r V a 1 C y s P r o P r o G I u A s p V a l A l a G l u A r g T h r L y s G L n T h r H i s L y s A s n S e r W E T l l e A r g
AACMTCTGG CGGATGCCTC CAGCAACAAC AACACCAGCG GCAGCATCAA CMCATCAGC AATATTGGCA ACATGAACGG TGGCAATCAG AGTAGCAGGA A s r A s n L e u A I a A s p A l a S e r S e r A s n 4 s n A s n T h r S e r G L y S e r l l e A s r A s n l l e S e r A s n l L e G l y A s n N E T A s n C 1 y C l y A s n G l n S e r S e r A r g A 1 0 8 0 1 A T C T C M G C G GGTCTCCTCA GCGCCACCAA TGCAAAATCT GGCCGTGGGT GAGTACAGAA TATTTGAAAA CMTAAATAA GAAGATAACA ACTTTTAGAC
s n L e u L y s A r g V a l S e r S e r A L a P r o P r o W E T G l n A s n L e u A l a V a l
TTATATGATA CTATGACTTA GAGTAAGTGC TCTGCTCTTG AGCCAATCTC GTTAGTAAAG TGTATTAAAC CTTGAAATCC TCTTCTAATC CTTAGTGAAT
V a l A s n 86, C6
GGACCACCTC CACCACCACT ACCTCCACCA TTGAGGGCAC CAGTTCAACA GCAACGCAAT MCTTGAGCG TCGATCAACC GAACAGCATG ACCGGTCCAA G l y P r o P r o P r o P r o P r o L e U P r o P r o P r o L e u A r g A l a P r o V a l G I n G 1 n G l n A r g A s n A s n L e u S e r V a l A s p C l n P r o A s n S e r n E T T h r C l y P r o T
CAATGCATCT ACAGAAGAAT GTCTTCGGAC CGAATCAGGG TCCAACCAAC GCCAATGGAA CAGGTGGTGC AGCTCCTCCG CCAGTTCCAG CACCCMTCC h r W E T H i s L e u G l n L y s A s n V a l P h e G l y P r o A s n G l n C 1 y P r o T h r A s n A l a A s n G l y T h r G l y G l y A l a A l a P r o P r o P r o V a l P r o A L a P r o A s n P r 11201 CGTGGCCACT GMGCAGTGG ACTCGGATTC GGGACTTGAA GTGGTCGAGG AGCCGAGCTT GAGGCCATCG GMCTGGTGC GCGGCAACCA CAATCGCACC
o V a l A l a T h r G l u A l a V a L A s p S e r A s p S e r G l y L w G l u V a l V a L G l u G L u P r o S e r L e u A r g P r o S e r G l u L e u V a L A r g G l y A s n H i s A s r A r 9 T h r ATGTCCACCA TATCGGGTGA GTTCTMATG ATTAAAATTA TATACTCTCC TTATCTTCCA CTAATAATGG AGTTAAATTA GAAAATCTCG AATGATTAAT W E T S e r T h r l L e S e r A
GTAGTTTATA TCAAAATTCT TGCGCATGGT TCGGAGTTAT CCAGGACCCC ATGACCATTA TCACACCGAG AAATCCATCC GCTATAATCA TTGATTGACT GATTGATTM AGTTCCCTAA CCGTTTGCCA GGTCTTTGAA CCCCATGATT TTTGTGGGGG GTTCCGCATC GTCGATGGCT TGTGACCACT GTCCACTTGT 11601 CCGTCTTGAT TGATCAATTA GGGGMATTC GCCAGTCATG TTCGGCTGAT GGGGGGAAAA TGGCTGTGAA GTGCCCTGCA GTGAAAAGAT ATAGAAACGA
GTTGCTCGAG GAGAACCTGA TGTCTGAGCG GCAGTCGAAG GCTGTCGATT GGATGGGATA ATGGTTACAT CTTGATTGAA GTGTTGGCTT TCAAGCAGTG TTAGCAAATG GCTGTGCGAA CAAAGCCATA CAAATAATGT CGTCAGTTTG GAAAATATTA TGATTGATTG TGTGTAGCCA TATGGMACA ATTTCTCCTT ATTTCAACAA ATGCAGCTTC GCAATTTTCT GTCCACTACT TTTCATCACC TTTCAGGATG CCTTTGCTM TTAGTTAAAA ATCCAAGTGG TCGTTTTCTA
1 2 0 0 1 GCCGATTTGT TGAAGCCCGA AGACAATAAC CTCATTATCA CACAATAAAT ATCGGGTATT ACTTAATTGC AAAAATAGGC GTAACTACCA AGCAAAAMC
present at the beginning of the major open reading frame of this mRNA, preceded by an in-frame non- sense codon. Sequence flanking the first of these translation initiation sites best fits the nucleotide bias in Drosophila melanogaster (CAVENER and RAY 1991). There is an inframe termination codon 2 1 nucleotides upstream of this ATG site. T h e 2.5-kb RNA contains six exons and spans 5.5 kb of genomic DNA (Figure
3). T h e open reading frame of this mRNA is 1.9 kb and encodes a polypeptide of
7 1 kD.
Sequence analysis of cDNA clones shows that alter- nate acceptor splice sites are acknowledged at two locations in exons common to the 1.9-, 2.5-, 5.4-, 5.6- and 6.4-kb transcripts. In one instance, the alternative
acceptor-splice sites correspond to the fourth exon of the 2.5-kb transcript. These alternative sites are lo- cated 24 bp apart at nucleotides 16,892 and 16,916 in the genomic sequence. T h e downstream site is acknowledged in 75% of the clones examined. T h e fifth exon of the 2.5-kb mRNA has the second pair of alternative acceptor splice sites observed during se- quence analysis of the isolated clones. Twelve nucle- otides separate these sites. In this case, the upstream site located at nucleotide 17,206, is acknowledged in
The Drosophilaforked Gene
M C T W T G C G M T M T ACCTGCCATG TTTGACATGT GTTGGCTCCG ATTTTCCATT TGTTAATTGC CCATTATGTC CGATGGGGCA GGACAAATGG AGACAGGCTT TMTTGAGCC GACGTTCACA CGATAGGCGG CGMGATCCG TGCAMATGT GGCCAAMU MGAAGTGGA AGTCGGGTTT CTGCTGGMA A C T G A M M G TGGAAGTGCC M C G A T G M A C M T C C C A G C CACTTCACTG CCGTATMGA CACMGTCAT TCGATATMG CTACACTUG CGAMAGCCG 12401 TAGCTCATAT GCTCCTTCAC ACGGGAMAA ATATGAGGGG GCTTTCTTAT AAAGGTACGC A A T C T A M A T C T T G G T M C T G T C T M T C T C TGTACATCTG
TACATCTCTG GCTTATMCC CMTTGATTT MCATCATTT TGATMAGAG TTTTTACCAA TTGATAMTT CGGTAGATGT GAATTATTTC GAGTCCATTA ACCTACMCA TAGTCGATTG TCGTTGGGTT AGAGGCCTTC AAAAGACCTA TGCCATATAC CCACAGTGGA CATTTGGGAT CCCCGGATTC GCGATGAGGA AGCMTTTGT CACMCTCGC GAATGTCCCA CACTATACAC ACTTGCACAT AACAATTGGC GACCTTCGAG CCATCACCCG CACCACCGCT CACATGGGTC 12801 ACGCCCACTG CGCCTCCTCC GCCTTTTGGT AATCGGCTCG TGGAACTTCG CATCTGCCM CTGCGAAAGT TTTACCCGCG AAAGTGAACG TGTTTCGCGA
AGCGACCTCT CTCGTGTCTC CMTGCGATC CGTCGTCATG GTCTTGGGTG AATAACTTTG GCGATGGCTT GAACCCAGTT C a g g c t g c c a acggagctcg A 1
cccaggcaac a t c g t c a c c c g g a g t t g a g a g t c g c g a c t t t t t c a a g c c t c c a c c c c g a c t t a t c c g g t c g t t c g t t c a t a t c g c g t t g a a c a t t c t g a t a a t g t g c a t t g a t t g a a c a t a g a t a a c a g a ATGAATATGC TGGGCTCGGT TMGATGCAC MGAATTCGC GCCGAAACTA TTTCACGCTC AGCTTCCATC
METAsnWETL e u G l y S e r V a I L y s M E T H i s L y s A s n S e r A r g A r g A s n T y r P h e T h r L e u S e r P h e H i s A 13200 GGTTCGATAT TTTTCGGTGT MGTGCTCCC MCTGAAATC GGTGTTGTAG A T M T C C C M ACGGTTACAG CCCGTGAAGT GTATTTACTT TMGTTTTTC
r g P h e A s p I l e P h e A r g S
TGTGTTTTTG TTMGTGGTT TTAGAGCTAC ATTCTTTGAT GGCTTTAAAT AAGAACGGCT ACATTTGATG ATATATTACA AACGAGTMT ATATTTACGT TATGTGTTGT GCAACTGGTA CCTTTTCTGC TTAACTTACA ATTTGTTTTT ATAAATTTTG TAAAATAAAT TTTAAACATA TACCGCTTM MTGTAGTGA G C C C A C M T T T C T T A A A T M T T A A C G T M A CACATTGTAT TCTTTMTAC TTACACTTTT CTAACAGATT AAACTTAATA TTTATTGCTT MTATTATGC 13600 AATACTAACT CTTATGATTG AAATGTTTAA GCATTAGTTC CTAAATCGTA ATATTTTATC GAGACTGGAC ACCTCTTAAA GAAATGTGAT TTGCACTTTC
TMGCCAAGC CAATTAGCTC AATTTAAAAA CTGATACCAT GAAGATCGGG CATTGCACTC GTAAAACTGC AAACGAAACC AAACTGCACA TCAACTCCM TTTCGAATTT CACCGCGAAC TTGACAGGTA CAAGCCCCTT TTTGCTCCGC ACAAATTAAT AGATGAGATG TGTGGGAACA TGTGGAAAAA TGAAAAAATG TGCTTTTCGT TAGCTCGCTT GAGTTGTTTT TTCGAATCGA ATGAATAAAT TCAATGCAAA CTGACTGAAC TCGATTGCGA GTAGTTTTGA CCATAAGAGT 1 4 0 0 0 GCAGCCATGA GTCTGCACCA CCACCTCGTA TGGTCTCGTC CAAACCACGT AATGATCGCC GTGAAAACGG GTTACCAAAG GCCGTGAAGT TGGGGACCCG
ATGCGATCGG GTTTAGATGG TGTTTGGCCA ATTAACCGGC TGCCTGGAAA TTGAATGCAG AGAGGCGACT TCAATGGAAA CGGCGGCTAA T T C M T T T T T ATGGGTATAC TCCCACGAAG ACGTTGGCGG GGMCTTATT CTTCCGTGTC TGCGGATATT MTCCAACTA TATAATATAT ATATGATTTC CCAGGAAGCA
14401 TATGTTCTAT CCMTCATGT GCGGACTTTT AGCCACATTC GTTGGCCACC TATCTCTGTT CCTTGGTCTC ACCATTTATC AMATTGGGG TATCGAGCAT MGCGAGCTT TCCATCAAAT CTGAGCTGCT TCCAGGCAAA CAATAAGGAA AGGCATTGAA TTGGGACACT TGAATGTTTC CGACATGCCA ATATCTAAAT
CTCTCGATTA CATATACATT TAACCACCM TMGTTATCA TTTATCAACT TGAGATTCTG AAGTTTGGTT ACCAATTAAT ATGTATGCCT M A A T T M T T
GAMTATAAA TATTTTAAAT MACTGCAGT AAAAGTGAAC CCGACTTGAT TTGGAAGCCA AGGAAATATT TAAAATATAT TCATTTTTTG AATGTATATA FIGURE 3.-Continued. TAMTGATTA TTATAGTTCC CAAAATGAGC TCTAATTTCA AATTATTTGC ATTCCAATAT GCAGCACTCA TTTAACATTT CTCTCCGTTA GACTTCCACA
14801 CAAAAGTAAT TTATTTGATT CACTTGTTCG TATTTTTATA TGCAACTGGG TTGCAACTCC GTCTCCTTCT CCTCGCAAAT CTCCGTCTGG TGGTGAAAGG CGCCTACGCA GCGATCCATT GTGGTTTGGG CTTCTCGTAG CTTCGTAGTT TCATAGTTTC GTATTTTCTT CAGTACATTT GCTGTAGTTT ATTTGCTGGC
TGCTCAAGTT AGCATGACCA GAGTCAGACT GCTGATAACT GCTAACCCAC AAAACTCATA AATAAATTGG TTCAATTCCA GCGAACAAGA MGCCAAGCT e r A s n L y s L y s A l a L y S L e A2.07.C7
GCTCMTGCC GGCAGCACAA ATGGTAGCTC CATCGCCGCC AGCTCGCATG ACTCCCAGTC CCGATACGW GGATCTGTAC ATGCTGCCM TGGTTCTGCG u L e u A s n A l a G l y S e r T h r A s n G l y S e r S e r I l e A l a A l a S e r S e r A s p A s p S e r G l n S e r A r g T y r G l y G l y S e r V a l H i s A l a A l a A s n G l y S e r A l a 1 5 2 0 1 GCCAATGGTC ACTTCTATGG CTACTCGGAG GGCAACAAAA ACCAGGGGAG CAATGCCAAT GGCAACGGCA ACGGCAACGG AGGTGGCGGC AACTATCACA
A l a A S f f i l y H i s P h e T y r C 1 y T y r S e r G l u G l y A s n L y s A s n G l n G l y S e r A s n A l a A s n G I y A s n G l y A s n G l y A s n G l yGlyGlyGly A s n T y r H i s S GCCATCCCM TCAGCCACAC TACGCGACTG GCCAGCCGCA TCAGTACACG CCCAGCCTGT ACGGAGGAGG CTCCTCCGCC GAGGATCACT ACGAACTTAC e r H i s P r o A s n G l n P r o H i s T y r A l a T h r G L y G l n P r o H i s G l n T y r T h r P r o S e r L e u T y r G I y G L y G 1 y S e r S e r A l a G l u A s p H i s T y r G l u L e u T h
ACAGCAGCAG CAGCAAATCT ACGGGGAAAC CCATTTGCCC AGCGGACAGA GCACGGGCGG ACCCAATCTG GTCAACAAGC AGCTGGTGCT GCCTTTTGTG r G L n G l n G l n G l n G l n I l e T y r G l y G l u T h r H i s L e u P r o S e r G l y G L n 4 r g T h r G l y G 1 y P r o A s n L e u V a l A s n L y s G l n L e u V a l L e u P r o P h e V a l
f hdw) CCGCCCAGTT TTCCCAATAA ATCACAAGAC GGCGTCACGC ATCTCATCAA GCCGTCCGAG TATCTCAAGA GCATCAGCAT CCACAAGCGT TCGTGGCAT P r o P r o S e r P h e P r o A s n L y s S e r C l n 4 s p G l y V a l T h r H i s L e u l l e L y s P r o S e r G l u T y r L e u L y s S e r l l e s e r l l e A s p L y s A r g S e r C y S P r o S
f ' , f 5(9YPsY)
15601 CCTCCGCCCG GTGAGTAGTG TTTATCCCCC AAATACCCCC ATGAATCGCA CAGTATATGT ACCTACTCGT A Z T A T C C T C CATCCTCCGA CTTTCAATTA e r S e r A l a A r
GCACATGGCA GCCGATTTAA TTTGGCTGAA CATCTCGTGC AACGGCTTTT TATAATTAGT ATCGGCGGTA CTGTTGCCTC GCCTCAATTA GCCAMATCT
f h 9 Y P s Y )
A M A M M A A AAAAATATAL STATAAAAAG ATACATGGCT TCAAAGACGC TTGGCTGCCG CTGAGCAATT AGCAGCTAGC AGATTGGCTA GATCTCTGTG CTGCGCAATT TGCAGCGGAA MTGCCTAAA GAAAATATAG CTAGATTTTC TATACCCAAC AATAGTTTTT CTTACAAAAT GCAATTTGAG ACAACTCMT
16001 TTGAGGTAAG CGTAGCTGCT GACAAATTGA TGTTTAATGT GCCACTCATC AGTAATGGTA TTGAAGTTAC TTGGTAAAAC AAGCTCGGAT T G A A W GTCAGATGCG CAGCACTGTA GCCGACAGAT ATTCGCGACT TCCTCTGCCG ATCAAGAGTG CATGAAATTT TATGCGTGCC ACTTTGCGAC CTGCCACTTT
513
will not be altered depending upon the selection of the site. Acceptor splice site selection does not appear to be specific to each size class forked RNA, as two independent clones of the 2.5-kb transcript display differences in both alternate acceptor-splice site selec- tions (data not shown).
The 5.6-kb transcrzfit: T o determine the structure of the larger forked transcripts, a random hexanucleotide primed cDNA library was generated from mid-late pupal poly(A+) RNA. This library, as well as the oligo- (dT)-primed library described above, was probed with DNA fragments spanning the forked locus depicted in Figure 1. Three overlapping cDNA clones that were isolated define the structure of the 5.6-kb transcript (Figure 1). Clone 15T hybridizes to all forked tran-
M E T A r g A l a T h r L e u A r g P r o A l a T h r P h 01
scripts on Northern blots whereas clones 86E and 1 1 E
514 K . K. Hoover, A. J. Chien and V. G. Corces
TCTTACCACA CATGCGATGT TGCAGCCGCC ACTGCMCTC AGTTCCAATC CGAACATGAT ACTGCGCACA CGCCACACCT CGCGCAGGTC CGCCATTCGC e L e u T h r T h r A s p A l a M E T L e u G l n P r o P r o L e u G l n L e u S e r C y s A s n P r o L y s M E T l l e L e u A r g T h r A r g H i s T h r T r p A r g A r
CTACTGGCCA CMCCCACCC CCATCTCCCA ACTCCCCTCT CCCMTTGCC ATTTTCCAGT GUAGTACCC CACATCTCAC ATCTCTACCT M T C C M C A T 16401 TTTCCCCCTT TTGTTCCACC MCACCTCCA CGCATACGGA CWCTACATG CACATACAGA TCGCCCAGCA GCAGTCGCAT CCGCACCACC ACCACCACCA
g S e r T h r A s p T h r G l u A s p T y r M E T G l n l l e G l n l I e A l a C l n G l f f i l n S e r H i s P r o H i s G l n H i s C l n H i s H i
f 36a (springer) f 1 4 ' Q g y p s y ) A3,B8,C8,D2
CCCCCATCCC C E C C S A T C CGCACCATCC CCACCAGCTG CATGGAATGG GCCAGCAGCC GCCGTTGCCC CCACCACCGC CCCCTCTACC CCACCWATG s P r o H i s P r o H i s P r o H i s P r o H i s H i s P r o H i s C l n L e u H i s C l y W E T C l y C l u C l n P r o P r o L e u P r o P r o P r o P r o P r o P r o L e u P r o H i s G l y W E T
PRD P TTACCCCCGC AGCCACCGAT GGCTCCACCG ATCCGTGTAC CAGGTCCCCA CGACGCAACC ATCTCTGCCA AATCACACM GCCCAACAAG CCCCACCCAT
L e u P r o P r o C I n P r o P r M E T G l y P r o P r o M E T G l y V a l A I a C l y G l y G l M s p A l a T h r M E T S e r C l y L y s S e r H i s L y s P r o L y s L y s P r o H i s C L y S CCCCACCCM TACCCAGGAC ACCGCCACCA GGAAGCACCA CCAGCCACTC TCGGCCATCT CTATCCAGCA TCTGAACAGC CTGCACGTCA CCMTTGGCA e r A l a P r o A s n S e r C l n A s p T h r A l a T h r A r g L y s C l n H i s C l n P r o L e u S e r A l a l l e S e r l l e C l n 4 s p L e u A s n s e r V a l G l n
16801 CTCCACTGTG CTCAGGMCA CTTATTATTA GGCAGGCTCA T A A G A M C M T T G l C C T T M C T T A C T A C M TTCCTCCACC M T T C T C C T T CACCTGCCCC L e u A r g A A&,BP,CP,D3 GCACGGACAC ACAGAagGTA CCCMCCCCT ACCACATGCC ACCTCCCAGC CTCAGCATGC AGTCTCTCAC CTCTAGCACG GAAACCTATC TTAAGTCCW r g T h r A s p T h r C l n L y s V a l P r o L y s P r o T y r G l W E T P r o A l a A r g S e r L e u S e r M E T C I n C y s L e u T h r S e r S e r T h r G l u T h r T y r L e u L y s S e r A s
CCTMTTGCC CACCTAAAGA TCTCCAACCA CATACCGCCC ATCMGAAGA TCAAACTGGA GCAGCAGATG GCCAGCCGGA TGGACTCGGA GCACTACATG p l e u l l e A l a G l u L w L y s l I e S e r L y s A s p l l e P r o C l y I l e L y s L y s M E T L y s V a l G l u G l n G l W E T A l a S e r A r g M E T A s p S e r G l U H i s T y r M E T
GAGATCACCA AGCAGTTCTC GGGCAACMC TACGTCCACC AGGTCAGTTC TGGCCMCTG GTCCCAGCCG AAACACMGC GAACGGTGGT TAAGAGGCGG G l u l l e T h r L y s C l n P h e S e r C l y A s n A s n T y r V a l A s p C In
17201 CTCAAAWTC TACTTGCagA TACCGGAAAG GGACCACCCC CGCAACCCCA TACCCGATTG CMGCGCCAG ATGATGCCCA AGAACGCCGC CGACCCGGCC I l e T y r L e u G l n l I e P r o G l u A r g A s p C l n 4 l a G l y A s n 4 l a l I e P r o A s p T r p L y s A r g G l n M E T M E T A l a L y s L y s A l a A l B G l u A r g A l a A5,BlO,ClO,D4
MGMGGACT TCGAGWCCG TATCCCTCAC WCCCAGACT CACCCCGCCT CTCACAGATT CCCCACTGGA AGCGGGATCT ACTGGCGCGA CGCGAGCAW L y s L y s A s p P h e C l u G l u A r g M E T A l a C l n C L u A l a C l u S e r A r g A r g L e u S e r C l n l l e P r o G l n T r p L y s A r g A s p L e U L e u A l a A r g A r g G l u G l u T C C G A C A A C M G C T G M G T M GTGCGTGTAG TCCCTACTGG GCACTCTCCT GACCCCTCCC CTCAATTGTC TGACCTTCTT CTAGGGCGGC CATCTATACG h r G l u A s n L y
s L e u L y s A l a A l
a l l e T y r T h r
A 6 , B l l , C l l , D 5 CCCMGGTGG AGCAWACM CCCCCTCCCC GACACCTCGC GGCTGAAGAA CCGCGCCATG TCCATCGACA ACATCAACAT CAATCTGGCC CCTATGGAGC
17601 AGCACTATCA GCAGATCCAG TTTCAGCACA AGCAGCAGCA ACTCCACTTG GGCAACAAGG AGAACCATGT CGATCCCAAG GGAGCTCATC AGGATCAGCA P r o L y s V a l G luGlnAsn4s n A r g V a l A l a A s p T h r T r p A r g L e u L y s A s n 4 r g A l a M E T S e r l l e A s p A s n l l e l s n l l e A s n L e u A l a G l y M E T G l u C
I n H i s T y r C I n G l n l l e G l n P h e G l n G l n L y s C l n G l n G l n L e u H i s L e u G l y A s n L y s C I u A s n H i S V a I A s p P r o L y s G l y A l a A s p C lnAspCln4s TCACGAGCAC GGAGTTGCCA TTCGCACTCC CCTTGCTAAC CCTAATTCTG GCCTCATTGG AGGTCAGGAG CTCATCGACC CATCGCACGG CACCCACCGC p G l n G l u H i s G l y V a l G l y l I e G l y S e r G l W a l G l y A s n A l a A s n S e r G I y V a l l l e C I y C l v C l n C l u L e u l l e A s p P r o S e r C l u C l r j e r C l n G l y
HS
CGCCCCGACG AGGAGGAGGA MCATCATCC CCTGCCCCCC CCAGCTGCCC ACACCAACAG CCCTCTCAGC CTGATCTGAT CTGCATCCAC GTACACATAC A r g A l a A s p G luclffilffil u T h r S e r S e r P r o G l y A l a P r o s e r C y s A l a A s p C l n G l n P r o S e r G l n P r o A S p L e u l l e C y s l l e H i S V a l H i S l l e P CCMAAAATT AAATCCATTT TTCMTCGAC TCCTAAGCGG TGTATTTTTT TTTTTTTTTT TTTTTGTGGA T t a a c t t a a a gcatgtgcaa t t a t a a t g a g r o L y s L y s L e u A s n P r o P h e P h e A s n 4 r q L e u L e u S e r C l v V a l P h e P h e P h e P h e P h e P h e P h e V a l A s e
H 6 f s ( g y p s Y ) 18001 aaaaacccta aatataaagc a t a a a a c c t a agcctaaggt c t a g c t a c t t a c a t t c t a c t a t a t g c a t a a t c a t t t t t g c c g t a t z a t c t a a t c t t a g c
a c c t c c a t a t t t t t t t t t t t t c g c c a a a a c g a a t a t t g t t a t t g c c t a g t g g g c a g t a c a g t a t c t c a a g g c g a a t c a a a t g c t g t g c t t a a g t t c t t a t t t c g g t t t t g t a t t t t a a t c a c t t a t a c t c t c t a t t t t t t t a c a t t t t t t t t t g t g a c t t g g c t t g a a t t t a t t t a g t a c t g g a a t t a a t t t a a t t g a t t ggttaagcat t t t t t a t t g t t a t t t g t t t t t a t a t a c g c g g c t c a a a c a a t a t t a c t t t t a a t a a t a t a a a t a c a c g a a t a c a t a c a a c a catctatgaa 18401 tgaaaaagcg aacgaatgaa acaataaaTA CATATATATA ATTTAAAACA AAAGAGAGM TGCTTTAGTC GACTTCCCCG A T M T T A T A T M C C G T G T C T
CACCTAGTGT GMTGCGAGC CCGAAATTTT ATTATTTTTC TTGGATATCG ATAGGTATTG G M M T A A A A TGAGAAAAAT TGATCAAAAG TGTGGGCGTG CCACTTTTAG CCGGCTTGTG GGCGGTAGAG TGGGCGTGGC ATCGTGGGCC CCATGTTTTC CTTTGCCGTT GGGGTGTGGT M G T T T T T C T T M T T T T G T A M T C M T T T T A T A C C T M A T TCCAGCCACC GCGCCMCCA ACTCGGTCTC CCTAACAMT TTTGGTCGGA ACGTATCGAC GTATTGTATA CTTGACATAC 18801 TCCGAWCCA ATACATTTTC M C G M T C T A GTATACCCTT TTACTCTACG AGTAACGTGT TTMCTACGT GTTGTGACCT TCTAGCTTGG G C W T G G G
TGCTGGTTGC AGTGATCCCT CTGTMTCCT TTCCCTTCAT ATTCTACTTG TTGCTTTTCT TGCAGTCCAG ACCCATTGAT ACCGCTGGCG CCATCATTTG MTCATCGAG TACAWATAT CCAGGMGTC TCTCTATTGG CAATACAATT MGATTTTTG AACATATMC ATCCTTTTTT ATCGGTATAG M C A T M G G T TWAGGTCCT T M A C T C C M TAACTCACTT ACTAATMTA AGCACATCGA TTTCCAAACT TCTGTATCTC TCACATTCGT T T A M A G T T T W C A G C M 19201 AAAAAAGCCA TGTTTGTCGG TCCGTCGGTA TCTCGTGATC TGAGGCACTA TACAGGCTAG AAACGTGTAT AAGATATCTT CCCACACCCC W A C C C M T A ACTCCCMAT CACCAAGAAA CCATCTCCTT TCCTCATAAT TTCATCGTTT TATTACCAAC TTATGTCATC CTGCCTATGC ACCTATTGAT GTTCCGCACA AAGCCCTTC
Ankyrin
-
I'k I e consensusRepeat #1
Repeat #2
Repeat #3
Repeat #4
Repeat #5
forked consensus
FIGURE 3.-Continued.
FIGURE 4.-Alignment for the ankyrin-like repeats encoded by the 5.4-kb RNA. Those positions at which residues are identical to the ankyrin-like consensus are darkly shaded. Sites in which amino acids are chemically similar with consensus residues or are present in the major- ity of repeats are lightly shaded. Chemically similar amino acids are grouped as follows (SCHWARTZ and
DAYHOFF 1978): A, S, T. P, and G ;
T h e Drosophilaforked Gene 515
J
Canton S
f 3
f 1 7 kFIGURE 5,"Bristle phenotypes of
forked mutants. Scanning electron mi- crographs showing dorsal thorax o f Canton S.f',f''*,f',f",f',f" and
f flies. T h e forked phenotype o f the flies ranges from the mildf' mu- tant to the severe j h d and f '& mu-
tants.
L I
f 5
f K
scripts. T h e 5.6-kb RNA is encoded within 12 kb of genomic DNA from coordinate +6.4 to +18.4 of the chromosomal walk and has eleven exons. A 4.3-kb open reading frame in this mRNA could encode a protein of 157 kD.
The 5.4" transcript: A cDNA clone, designated
77X, composes cDNA C and was isolated from the screening of the random hexanucleotide primed li- brary. This clone hybridizes to 6.4-, 5.6- and 5.4-kb RNAs when used as a probe. Four exons are contained within the clone. T h e most 5' of these exons, named
C1,
hybridizes only to the 6.4- and 5.4-kb RNA. T h e relative abundance of these RNAs suggest that cDNA C, which is composed of clones 77X and 15T, corre- sponds to the 5.4-kb transcript. T h e three terminal exons of clone 77X are shown by sequence analysis tobe the same as exons B2, B3 and B4 in the 5.6-kb
RNA. T h e 5' end of the C1 exon does not contain an open reading frame and represents the transcribed, untranslated leader of the transcript. Sequence flank- ing the translation initiation site of the 5.4-kb RNA are in agreement with that found in other Drosophila genes (CAVENER and RAY 199 1). This transcript spans 18 kb of genomic DNA from interval +0.4 to
+
18.4 of the chromosomal walk and has a 4.3-kb open read- ing frame that encodes a polypeptide of 155 kD.516 K. K. Hoover, A. J. Chien and V. G. Corces
in transcript A. A fragment that represents exon D l (Figure l), hybridizes on Northern blots to both the 1.9- and 1.1-kb RNAs, suggesting that cDNA D, which contains clone 2W, represents the 1.9 kb tran- script. Genomic sequence analysis indicates that a 90- bp open reading frame extends upstream of the 5‘ end of clone 2W. This open reading frame contains three translation start sites. Sequences flanking the first of these sites is most similar to sequences that are adjacent to the translation start sites of other Dro- sophila genes (CAVENER and RAY 1991). Assuming that this is the 5’ most exon of the 1.9-kb transcript and that the ATG site within this open reading frame is the translation start site of this transcript, the open reading frame of the 1.9-kb RNA extends 1.4 kb and encodes a 53-kD protein.
The 1.1-kb transcript: T h e structure of the 1.1-kb RNA was determined by Northern analysis using exon specific probes that were generated by PCR. No cDNAs were isolated for this RNA, and therefore the structure of this transcript is only tentative. T h e 1.1- kb transcript hybridizes to probes representing exons D l , A3 and A6. Exon A6 was further divided into two equal 550-bp fragments and each were used as probes. T h e 1.1-kb RNA hybridizes only to the frag- ment corresponding to the terminal half of exon A6. These results show that the 1.1- and 1.9-kb RNAs probably share the same promoter and that alternative splice selection occurs in the 1. l-kb RNA. Analysis of consensus splice sequences suggests that the splice acceptor site for the last exon of the 1.1-kb RNA is close to the untranslated sequence in the terminal exon shared by all forked RNAs.
The 6.4-kb transcript: Northern analysis indicates that the 6.4- and 5.4-kb RNAs hybridize to the same genomic and exon specific probes from interval -0.5 to +19.4 of the chromosomal walk. A genomic frag- ment that extends 500 bp upstream of the 5’ end of cDNA clone 77X of the 5.4-kb RNA, from coordi- nates -0.5 to 0, encodes multiple stop signals in all reading frames and selectively hybridizes to both of these RNAs. Probing Northerns with a 1.0-kb frag- ment that extends distally from -0.5 to -1.5, detects only the 6.4-kb RNA and reveals that the transcription start site of this RNA is upstream of that for the 5.4- kb RNA. Northern hybridizations using genomic frag- ments that extend 10 kb upstream of interval
-
1.5 as probes do not detect forked RNAs. Therefore, the transcription start site of the 6.4-kb RNA should be located between the XhoI site at position 0 and the BamHI site at -2.0 kb. Sequence of the genomic DNA in this region reveals no significant open reading frames. Based on these results, a tentative structure for the 6.4-kb transcript is represented in Figure 1, assuming that the only difference between the 5.4-and 6.4-kb RNAs is a longer transcribed untranslated region in the latter.
Structural analysis of the forked proteins: T o de- termine if a function could be predicted from the structural domains of the forked products, the amino acid structure of the proteins encoded by the 2.5-, 5.4- and 5.6-kb mRNAs was analyzed. A feature com- mon to all three of these proteins is the presence of stretches of at least 16 hydrophobic residues that may represent regions of the protein spanning the cell membrane. Two such hydrophobic stretches occur in exon A6 (Figure 1) which is common to all three RNAs encoded by nucleotides 17,714-17,777 and
17,909-17,972 of the genomic sequence (Figure 3). T h e coding region shared by the 5.6- and 5.4-kb transcripts has as many as three of these hydrophobic stretches in exon B4 (Figure l), spanning nucleotides 9,285-9,348, 9,450-9,512 and 9,822-9,870. T h e 5’ exon of the 5.4-kb RNA also contains a significant stretch of residues that commence at nucleotide 458 and includes three residues of the next exon. Hydro- phobic stretches of this size are characteristic of trans- membrane spanning peptides, a-helices, and mem- brane associated proteins. It is appealing that these hydrophobic stretches are present in proteins that are involved in the formation of microfibrillar structures of bristles that are apposed to the cell membrane.
T h e characterized transcripts of the forked locus contain several regions in which there are concentra- tions of specific residues. Thirteen of the 24 amino acids in exon A3, between nucleotides 16,558 and
16,630 in the genomic sequence, encode proline res- idues. T h e 5.4- and 5.6-kb RNAs have glycine rich stretches encoded in exon B4, between nucleotides 9,267-9,341 and 9,444-9,482 in which 13 of 25 amino acids and 11 of 13 amino acids are glycine, respectively. Polyglycine tracts are devoid of structure and these peptide stretches may serve a role analogous to the hinge region of the products of Ultrabithorax (BEACHY, HELFAND and HOGNESS 1985).
All six forked transcripts contain an exon that en- codes a tandem repeat of alternating histidine and proline residues between nucleotides 16,478 and
16,535 in the genomic sequence (Figure 3). T h e re- peat is located in the third exon of the 2.5-kb RNA and the eighth exon of the 5.4- and 5.6-kb transcripts. This motif was first observed in the product of the paired gene and has been termed the PRD repeat, or
The Drosophilaforked Gene 517
has been used to probe developmentally staged North- ern blots. This screen demonstrated that many other transcripts contain the PRD motif, but its role has not been determined (FRIGERIO et al. 1986).
The forked gene has two copies of a sequence de- fined by the presence of a repeated (CAX), motif, where
X
is the nucleotide G, A or C, and n usually has a value of less than 30. These repeats are located within exon B4 (Figure 1) which is the fourth exon of the 5.4- and 5.6-kb RNAs. The (CAX), motif is pres- ent eighteen times from nucleotides 9,078 to 9,135 in the genomic sequence (Figure 3). The second re- peat is located near the 3’ end of the exon and contains thirty-two of these codons from nucleotides 9,987-10,094. This repeat has been found in orga- nisms ranging from yeast to humans, in genes that are developmentally or tissue specifically regulated (WHARTON et al. 1985a; SCHULTZ and CARLSON 1987; SUZUKI et al. 1988; CHANG et al. 1987; DUBOULE et al.1987). The (CAX), motif encodes a polyglutamine stretch that is often interrupted by histidine residues and has been recognized under the names of opa (WHARTON et al. 1985a,b), strep (KIDD, LOCKETT and YOUNG 1983), and the M repeat (SCHNEUWLY et al.
1986). In
D.
melanogaster, the opa repeat occurs at least 500 times in the haploid genome and has been found to be expressed differentially in a develop- mental profile (WHARTON et al. 1985b). The opa motif is associated with homeobox containing genes such as Antennupedia (LAUGHON et al. 1986), Deformed (RE-GULSKI et al. 1985), cut (BLOCHINGER et al. 1988), engrailed (KUNER et al. 1985) and fushi tararu (LAUGHON and SCOTT 1984). In the transcription factor Spl, two glutamine rich domains, containing approximately 25% glutamine residues, have been shown to be potent activators of transcription (COUREY et al. 1989). A chimaera containing the Spl zinc finger binding domain and the nonhomologous opa repeat of Antennapedia was observed to synergis- tically stimulate the expression of the SV40 enhancer, suggesting that the repeat may serve an evolutionary conserved structural or functional role (COUREY et al.
1989).
Products of the 5.6- and
5.4-kb
RNAs
containankyrin-like repeats: A search of the Genbank pro- tein database revealed significant homology between regions common to the three largest forked products and polypeptide stretches of proteins known to con- tain a repeated motif found in the ankyrin protein (LUX, JOHN and BENNETT 1990). The products of the 5.6- and 5.4-kb RNAs contain four and five copies of this ankyrin-like repeat, respectively. The motif has also been recognized as the Cdcl O/SW 16 repeat (BREEDEN and NASMYTH 1987). The amino terminally situated forked ankyrin-like repeats are arranged in tandem. This repeated motif is encoded in the first
four exons of the 5.4-kb transcript and the second, third, and fourth exon of the 5.6-kb RNA (Figure 3).
When conservative residue substitution is taken into consideration (Figure 4), the forked ankyrin-like re- peats show significant homology to the ankyrin-like consensus (LAMARCO et al. 1991). Additionally, posi- tion two is usually an arginine residue within the forked ankyrin-like repeats. As observed in the consensus for the 22 ankyrin repeats (LUX, JOHN and BENNETT
1990) and the six ankyrin-like repeats in the product of the Caenorhabditis elegans gene fem-1 (SPENCE, COULSON and HODGKIN 1990), position 17 of the forked ankyrin-like consensus is typically a glycine res- idue. Position 22 of the ankyrin-like consensus is a leucine residue as is the case of the consensus for the seven copies of this repeat motif in the product of the human bcl-3 gene (OHNO, TAKIMOTO and MC- KEITHAN 1990). The length of ankyrin-like repeats has generally been found to be 33 residues, but varies from the half repeats in the human NF-KB binding protein (BOURS et al. 1990), to the presence of non- contiguous repeats in the products of the yeast genes cdclO and SW16 (AVES et al. 1985; BREEDEN and NASMYTH 1987). Ankyrin-like repeats of varying numbers and lengths have been found in many classes of proteins in organisms ranging from yeast to hu- mans. The yeast cell cycle control genes cdcl0, SW14 and SW16, encode proteins that contain two noncon- tiguous repeats (BREEDEN and NASMYTH 1987; AN- DREWS and HERSKOWITZ 1989). Products of the Notch locus of D . melanogaster and the lin-12 and glp-I genes of C. elegans are involved in tissue differentiation and contain six contiguous ankyrin-like repeats in their cytoplasmic domains (GREENWALD 1985; YOCHEM and GREENWALD 1989). The fem-1 gene product of C. elegans is involved in sex determination and contains six contiguous ankyrin-like repeats (SPENCE, COUISON and HODGKIN 1990). The ankyrin-like motif has been observed in transcription factors as well. Studies of the heterodimeric DNA binding protein GABP indi- cate that the amino terminus of the
p
subunit contains four contiguous ankyrin-like repeats (LAMARCO et al.199 1). These repeats help stabilize GABP
p
interac- tion with the CY subunit (THOMPSON, BROWN andMCKNIGHT 199 1).
Analysis of spontaneousforked mutations: T o ana- lyze the regulatory and coding regions of the forked gene further and to determine the mechanism of mutagenesis by the insertion of transposable elements, eight spontaneous forked mutations were character- ized. Genomic DNA was isolated from f
’,
f’,
f K , f 36a, f ’4k, f z7k, f hd and f mutant stocks. This DNA wasthen treated with restriction endonucleases. Genomic Southerns containing DNA isolated from Canton S