Proc. Natl.Acad. Sci. USA Vol.82,pp.4673-4677,July 1985 Biochemistry
The
nucleotide
sequence of the gene for human protein C
(DNAsequenceanalysis/vitamin K-dependentproteins/bloodcoagulation) DONALD C. FOSTER, SHINJI YOSHITAKE, AND EARLW. DAVIE
DepartmentofBiochemistry, University of Washington, Seattle, WA 98195
ContributedbyEarl W. Davie,April 9, 1985
ABSTRACT Ahuman genomic DNA library was screened for the gene forprotein C byusingacDNA probecoding for the human protein.Three different overlapping A Charon 4A phagewere isolated that contain inserts for the gene for protein C.Thecomplete sequence of the gene was determined by the
dideoxymethod and shown to span about 11 kilobasesof DNA. The coding and3' noncoding portion of the gene consists of
eight exons and seven introns. The eight exons code for a preproleader sequence of 42 amino acids, a light chain of 155 aminoacids, a connecting dipeptide of Lys-Arg, and a heavy chain of 262 aminoacids. The preproleader sequence and the
connecting dipeptideareremoved duringprocessing, resulting in thematureproteincomposedofa heavy and alight chain
heldtogether by a disulfide bond. The heavy chain also contains thecatalyticregionforthe serine protease. Two Alu sequences and two homologous repeats of about 160 nucleotides were found in intronE.The sevenintrons in thegene for protein C arelocated inessentiallythe samepositionsintheamino acid sequence as the sevenintronsinthe gene for human factorIX,
while thefirstthree introns inprotein Carelocated in the same
positionsasthefirstthree in thegene for humanprothrombin.
Protein C isaprecursor to aserine proteasepresentin plasma thatplaysanimportant physiologicalrole in theregulationof blood coagulation (1, 2). Human protein C is a vitamin K-dependent glycoprotein containing nine residues of
-carboxyglutamic acid and one equivalent of
p-hydroxy-aspartic acid. Protein C shows considerable structural ho-mology with the other vitamin K-dependentplasma proteins
involvedinbloodcoagulation, includingprothrombin, factor VII,factor IX, and factor X. Protein C is synthesizedas a single-chain polypeptide that undergoes considerable pro-cessingtogiveriseto atwo-chain molecule heldtogetherby a disulfide bond. The two-chain form is converted to acti-vated protein Cby thrombinby the cleavage ofa12-residue
peptidefrom the amino terminus of theheavy chain (2). This reaction is greatly accelerated by the presence of
thrombomodulin (3). Activatedprotein Cregulates the co-agulationprocess bytheinactivation of factorVa (4, 5)and
factor VIIIa (4, 6) by minor proteolysis. Consequently,
individuals lacking protein C often have a history of thrombotic disease (7, 8).
Studies fromourlaboratory(9) andthatof others(10)have ledtotheisolation and characterization of thecDNAcoding
forhumanand bovineproteinC.Inthepresent
investigation,
thecDNA for humanproteinC has been used for theisolation ofoverlappinggenomic clones from a X Charon 4A phage
library. The nucleotide sequence of the gene was then
determinedandcomparedwith thegenesforhumanfactorLX
(11, 12)andprothrombin (13).
MATERIALS ANDMETHODS
Screening of the Genomic Library. A human genomic
library in X Charon 4A phage (14) wasscreened for genomic clones of human protein C by the plaque hybridization procedure ofBenton andDavisasmodifiedbyWoo (15) using acDNA for human protein C (9) as thehybridization probe. ThecDNAstartedatamino acid64ofhumanprotein Cand extendedtothe second polyadenylylation signal (9). It was
radiolabeled bynick-translationto aspecificactivity of 8 X 108
cpm/,ug
with all four radioactive ([a-32P]dNTP) deoxynucleotides. Theprobe wasdenaturedandhybridizedto the filters at a concentration of 1 x 106 cpm/ml in a
hybridization solution containing 6xNaCl/P,(lx NaCl/P,=
0.15MNaCl/0.015Msodiumcitrate,pH 7.0), 5xDenhardt's solution (1x = 0.02% polyvinylpyrrolidone/0.02% Ficoll/0.02%bovine serumalbumin), 0.1% sodiumdodecyl sulfate,100 ,ugofyeast tRNA perml, and50o formamideat 420C for 60 hr. The filters were washed in lx NaCl/P1 containing 0.1% sodium dodecyl sulfateat680C for1 hrand exposed to x-ray film for 16 hr. Positive cloneswere then isolated and plaque-purified.
DNA Sequence Analysis. Phage DNA was prepared from
positiveclonesby the liquid culture lysis methodasdescribed by Silhavy et al. (16). The genomic DNA inserts in the purified phage wereremoved by digestion withEcoRI and
thensubclonedinto pUC9 for subsequent restriction mapping and sequencing. In order toobtain overlappingDNA
frag-ments,theDNA insertsweredigested also withBglII, and thefragments corresponding to thegeneforprotein Cwere
subclonedinto the BamHI site of pUC9.
Thesequenceofgenomicfragmentscontainingthe genefor protein Cwasdeterminedbothby direct cloning of specific restriction fragments into the M13 phage cloning vectors
mplO, mphl, mpl8, and mpl9, as well as by the BAL-31 exonuclease method described by Guo et al. (17) and Yoshitakeetal. (12).
Dideoxy chain termination sequencing reactions were
carried out with 3"S-substituted deoxyadenosine
5'-[a-thioltriphosphate
(dATP[a-35S];
Amersham) essentiallyasdescribed in thesequencingmanualprovided by Amersham andrunonbuffergradient gelsasdescribedbyBigginetal. (18).More than90% of thesequencewasdeterminedtwo or
more times, and =50% was determined on both strands. DNA sequenceswerestoredandanalyzed bythe computer programs ofLarson and Messing(19).
M13 vectors mplO,
mphl,
mpl8, andmpl9,deoxynucle-otide triphosphates, and
dideoxynucleotide
triphosphates were purchased from P-L Biochemicals. Restrictionen-zymes, T4 DNAligase, bacterialalkaline phosphatase, and
the Escherichia coliDNApolymerase I(Klenowfragment) werepurchased fromNewEnglandBiolabsorfrom Bethesda ResearchLaboratories.
Abbreviation: kb, kilobase(s).
4673
Thepublicationcostsof this articleweredefrayedinpartbypagecharge
payment. This articlemustthereforebeherebymarked"advertisement"
4674
Biochemistry:
al. Acad. Sci. USA 82(1985)-EM
3~ ~ ~ ~ ~~~~~ E 3 C-- : 3 ~ E
0- V)Qa. U)U)dIU) ' (/)tfCD ~W) Y( d a U U )c XU) U) (n) LJw Qa.a- coQ. U)
1-ti1
I
~
(
I
I(
C1m
II
I
16
-
1
Ij
o 2 3 4 5 6 7 8 9 10 I
ExonlI Exon1l Exons1l, IV,andV ExonVI ExonVI] ExonV111
FIG. 1. Detailedrestrictionmapandsequencingstrategy for thegenefor humanprotein C.Thelocations of each of theeightexonsareshown withsolid bars. Thelengthanddirection of eachsequencingreactionare shownbythinarrows.
RESULTS ANDDISCUSSION another (PCX8) to the 3'
region,
and the third (PCX6) was AhumngeomicNAlbrar Q X 01page) nkCaron positiveto bothsets ofprobes.
4A
phuangenomicrDNA
libhararyi(2ax106
phage inCharo
obe Thegenomic
DNAinserts in PCX6 and PCX8weremapped
4Aupageroei
waCreen
redwithfaradiolabeled
clNpobes fore
by
single-
anddouble-restriction-enzyme digestion
followed humaproein. Thee dffernt psitie clnes ereby
agarosegel
electrophoresis,
Southernblotting,
andhy-isolated, and each was
plaque-purified.
These three clones bridizationtoradiolabeled 5' and 3'probes
derived from the exhibitedunique
patterns ofEcoRIfragments
upon electro- cDNAfor humanprotein
C. Thisanalysis suggested
that thephoresis
in 0.7% agarose but also containedfragments
in geneforprotein
Cwas presentin threeEcoRIfragments
of common with each other. Southern blothybridization
of 4.4,6.2,and 6.9kilobases(kb)oriented 5'to3'in thegenome.-digests
of these clones withprobes
made from the 5' and3' The 4.4-kbfragment
wasisolatedfromphage
PCX6, and the ends of the cDNA established thatoneoftheclones(PCX1) 6.2-kb and6.9-kbfragments
wereisolatedfromphage
PCX8;corresponded
to the 5'region
ofthe gene forprotein
C, eachwassubcloned into the EcoRI site ofpUC9.
Toprovide
AGTGAATCTG GGCGAGTAAC ACAAAACTTGAGTGTCCTTA CCTGAAAAAT AGAGGTTAGA GGGATGCTAT GTGCCATTGT GTGTGTGTGT TGGGGGTGGGGATTGGGGGT GATTTGTGAGCAATTGGAGG -2001
TGAGGGTGGAGCCCAGTGCC CAGCACCTAT GCACTGGGGA CCCAAAAAGG AGCATCTTCT CATGATTTTA TGTATCAGAA ATTGGGATGG CATGTCATTG GGACAGCGTC TTTTTTCTTGTATGGTGGCA -1871 CATAAATACA TGTGTCTTAT AATTAATGGTATTTTAGATT TGACGAAATA TGGAATATTA CCTGTTGTGC TGATCTTGGGCAAACTATAATATCTCTGGG CAAAAATGTC CCCATCTGAAAAACAGGGAC -1741
AACGTTCCTC CCTCAGCCAG CCACTATGGG GCTAAAATGA GACCACAICT GTCAAGGGTT TTGCCCTCAC CTCCCTCCCT GCTGGATGGCATCCTTGGTA GGCAGAGGTG GGCTTCGGGC AGAACAAGCC -1611 GTGCTGAGCT AGGACCAGGA GTGCTAGTGC CACTGTTTGT CTATGGAGAG GGAGGCCTCA GTGCTGAGGGCCAAGCAAAT ATTTGTGGTTATGGATTAAC TCGAACTCCA GGCTGTCATG GCGGCAGGAC -1481
GGCGAACTTG CAGTATCTCC ACGACCCGCC CCTGTGAGTC CCCCTCCAGG CAGGTCTATG AGGGGTGTGGAGGGAGGGCT GCCCCCGGGA GAAGAGAGCT AGGTGGTGAT GAGGGCTGAATCCTCCAGCC -1351
AGGGTGCTCA ACAAGCCTGA GCTTGGGGTA AAAGGACACAAGGCCCTCCA CAGGCCAGGC CTGGCAGCCACAGTCTCAGG TCCCTTTGCC ATGCGCCTCC GTCTTTCCAGGCCAAGGGTC CCCAGGCCCA -1221 GGGCCATTCC AACAGACAGT TTGGAGCCCA GGACCCTCCA TTCTCCCCACCCCACTTCCA CCTTTGGGGG TGTCGGATTT GAACAAATCT CAGAAGCGGC CTCAGAGGGA GTCGGCAAGA ATGGAGAGCA -1091 GGGTCCGGTA GGGTGTGCAG AGGCCACGTG GCCTATCCAC TGGGGAGGGT TCCTTGATCTCTGGCCAccA GGGCTATCTCTGTGGCCTTT TGGAGCAACC TGGTGGTTTG GGGCAGGGGTTGAATTTCCA -961
GGCCTAAAAC CACACAGGCC TGGCCTTGAG TCCTGGCTCT GCGAGTAATG CATGGATGTAAACATGGAGA CCCAGGACCT TGCCTCAGTC TTCCGAGTCT GGTGCCTGCA GTGTACTGATGGTGTGAGAC -831
CCTACTCCTG GAGGATGGGG GACAGAATCT GATCGATCCC CTGGGTTGGT GACTTCCCTG TGCMATCAAC GGAGACCAGC AAGGGTTGGATTTTTAATAA ACCACTTAAC TCCTCCGAGT CTCAGTTTCC -701 CCCTCTATGA AATGGGGTTG ACAGCATTAATAACTACCTC TTGGGTGGTT GTGAGCCTIA ACTGAAGTCA TAATATCTCA TGTTTACTGA GCATGAGCTA TGTGCAMAGC CTGTTTTGAG AGCT'TTATGT -571 GGACTAACTC CTTTAATTCT CACAACACCC TTTAAGGCAC AGATACACCA CGTTATTCCA TCCATTTTAC AAATGAGGAAACTGAGGCAT GGAGCAGTTAAGCATCTTGC CCAACATTGC CCTCCAGTAA -441
GTGCTGGAGC TGGAATTTGG ACCGTGGAGT GTGGCTTCAT GGCGTGCGCT GTGAATCCTG TAAAMATTGT TTGAAAGACACCATGAGTGTCCAATCAACG TTAGCTAATA TTCTCAGCCC AGTCATCAGA -311 CCGGCAGAGG CAGC-CACCCC ACTGTCCCCA C-GGACGACACAAACATCCTG GCACCCTCTC CACTGCATTC TGGAGCTGCT TTCTAGGCAG GCAGTGTGAG CTCAGCCCCA CGTAGAGCGG GCAGCCGAGG -181 CCTTCTGAGG CTATGTCTCT AGCGAACAAG GACCCTCAATTCCAGCTTCC GCCTGACGGCCAGCACACAG GGACAGCCCT TTCATTCCGC TTCCACCTGG GGGTGCAGGC AGAGCAGCAGCGGGGGTAGC -51
-42
Met TrpGin Leu Thr Ser Leu Leu Leu Phe Val Ala ThrTrp Gly Ilie Ser GiyThr ProAla
ACTGCCCGGA GCTCAGAAGT CCTCCTCAGA CAGGTGCCAG TGCCTCCAGA ATG TGGCAG CTC ACA AGC CTC CTG CTG TTC GTG GCC ACC TGG GGAATT TCC GGC ACA CCA GCT 63 -20
Pro Leu
CCT CTTGIGTAAGGCCAC CCCACCCCTA CCCCGGGACC CTTGTGGCCT CTACAAGGCC CTGGTGGCAT CTGCCCAGGC CTTCACAGCT TCCACCATCT CTCTGAGCCC TGGGTGAGGTGAGGGGCAGA 190
TGGGAATGGC AGGAATCAACTGACAAGTCC CAGGTAGGCC AGCTGCCAGA GTGCCACACA GGGGCTGCCA GGGCAGGCAT GCGTGATGGC AGGGAGCCCC GCGATGACCT CCTAAAGCTC CCTCCTCCAC 320
ACGGGGATGG TCACAGAGTC CCCTGGGCCT TCCCTCTCCA CCCACTCACTCCCTCAACTG TGAAGACCCC AGGCCCAGGC TACCGTCCAC ACTATCCAGC ACAGCCTCCC CTACTCAAAT GCACACTGGC 450
CTCATGGCTGCCCTGCGCCA ACCCCTTTCC TGGTCTCCACAGCCAACGGGAGGAGGCCAT GATTCTTGGG GAGGTCCGCA GGCACATGGGCCCCTAAAGC CACACCAGGC TGTTGGTTTC ATTTGTGCCT 580 TTATAGAGCT GTTTATCTGC TTGGGACCTG CACCTCCACC CTTTCCCAAG GTGCCCTCAG CTCAGGCATA CCCTCCTCTAGGATGCCTTT TCCCCCATCC CTTCTTGCTC ACACCCCCAA CTTGATCTCT 710 CCCTCCTAAC TGTGCCCTGC ACCAAGACAG ACACTTCACAGAGCCCAGGA CACACCTGGG GACCCTTCCT GGGTGATAGGTCTGTCTATC CTCCAGGTGT CCCTGCCCAA GGGGAGAAGCATGGGGAATA 840 CTTGGTTGGG GGAGGAAAGG AAGACTGGGG GGATGTGTCA AGATGGGGCT GCATGTGGTG TACTGGCAGA AGAGTGAGAGGATTTAACTT GGCAGCCTTT ACAGCAGCAG CCAGGGCTTGAGTACTTATC 970
TCTGGGCCAG GCTGTATTGG ATGTTTTACA TGACGGTCTC ATCCCCATGTTTTTGGATGA GTAAATTGAA CCTTAGAAAG GTAAAGACAC TGGCTCAAGG TCACACAGAG ATCGGGGTGG GGTTCACAGG 1100
GAGGCCTGTC CATCTCAGAG CAAGGGTTCG TCCTCCAACT GCCATCTGCT TCCTGGGGAG GAAAAGAGCA GAGGACCGCT GCGCCAAGCC ATGACCTAGA ATTAGAATGA GTCTTGAGGG GGCGGAGACA 1230 -19
AspSer Val PheSer Ser Ser
AGACCTTCCCAGGCTCTCCC AGCTCTGCTT CCTCAGACCC CCTCATGGCC CCAGCCCCTC TTAGGCCCCT CACGAAGGTGAGCTCCCCTC CCTCCAAAAC CAGY AC TCA GTG TTC TCC AGC AGC 1353 -1 +1
Gin ArgAla His Gin Val LeuArg Ilie Arg Lys ArgAia Asn Ser Phe LeuGiu Giu LeuArg His Ser SerLeuGiu ArgGiu Cys IlieGin Giu Ile CysAsp
GAG CGT GCC CAC CAG GTG CTG CGG ATC CGC MAACGT GGC AAC TCCTIC CTG GAG GAG CTC CGT GAG AGC AGC CTGGAG CGG GAG TGG ATA GAG GAGAIG TGT GAC 1458 37
Phe Giu GiuAia LysGi Ilie PheGin Ann Vai Asp Asp Thr
TTC GAG GAG GGC MAG GAAATT TIC GMAMI GTGGAT GAG ACAVYGTMAGGCCAC CATGGGTCGA GAGGATGAGG CTCAGGGGCG AGCTGGTMAC CAGGAGGGGCCTCGAGGAGC 1570
AGGTGGGGAC TCMATGCTGAGGCGCTGTTA GGAGTTGTGG GGGTGGGTGA GTGGAGCGAT TAGGATGGTGGCCCTATGAT GTCGGGGGAGG GAGATGTGAG TGCAAGMAAC AGMATTCAGGAAGMAGCTCC 1700
AGGAAAGAGT GTGGGGTGAC GCTAGGTGGG GACTGGGACA GGGAGAGTGT AGGTGGITGAGTGGAGCGTG GAGGGACTGC TGAGGACCAC TGCCTCCGGG TGCGAGGTCA CAMAGAGGGG ACCTAMAGAC 1830
GACGCTGCTT CCAGGCATGC CTCTGCTGAT GAGGGTGTGTGTGTGAGGGA AAGTGACTTG TGTGGAGATA MAATCGCTCAGIGTGTGGGT CACATCAMAG GGAGAAMATC TGATTGITGA GGGGGTGGGA 1960 AGACAGGGTC TGTGTGGTATTTGTGTAAGG GTCAGAGTCC TTTGGAGGGG CCAGAGTGCT GTGGACGTGG GGGTAGGTAGTAGGGTGAGG TTGGTMACGGGGCTGGCTTG CTGAGACMAG GCTCAGACCC 2090
GGTCTGTCGC TGGGGATCGC TTGAGCCACC AGGACGTGAAMATTGTGGAC GCGTGGGGGG CCTTCCAAGG GATGGAGGGAIGGCTTGGAG TGGAGGGTTT GAGGGGAGGA GACCGTGTGG CCTGCACCCT 2220
GTGTTGGGGI CAGCCTGCAC CTCCTTGACT GGAGCGGCAT GIGGAGCCGG ATCCGGAGCACCTCGTTIGGG GAGTGGCCTG GGIGGGAGAG ACCACAGTGA GITTIGCGAG GCACATATCT GATCACATCA 2350
AGTCCCCACC GTGCTGGCAG CTGAGGGATG GTGTGTGAGG CGCAGCAGGC TTGGGTGGGG TCTCTGATGG AGCAGGCATC AGGCAGAGGC CGTGGGTGTCAACGTGGGCT GGGTGGTGCC GGAGGAGCAG 2480
GAGGCGCCGC AGGAGCMACG GIGGTAGGIG GTTAGGAAGG GAGACCCTCT GCGCGCATGC GCCCAACTGI GAAAMAGCAT GGGTTAGGGAAAGGGCGGAT GCTGAGGGGT CCCCCAMAGC CGGGAGGCAG 2610 AGGGAGTGATGGGAGTGGAA GGAGGGCGAG IGACTTGGTGAGGGATTCGG GTGCCTTGGA TGGAGAGGGT GGIGTGGGAG CGGACAGTGG GGAGAGGAGG ACGCGAGGIG GATGGGGAGA GGGTGTTGCT 2740
GGAGGGAGGT GGGATGGAGGGTGGGGGGGG GCGGGGTGGCG GTGGAGGGCG GGGGAGGGGC AGGGAGCACC AGGTGGIAGC AGGGAACGAG GATCGGGGGTGGATGGCGTG TTGTGTGGAA GCCCTCCGCC 2870
38 45
GLen Ala PheTrp Ser LysHis VaI
GCCCTGCCCG GIGACCGCGG GGGGIGCGGG AGGGGGGGGG GGCGCTCGCG AGAGGGGCTG GAGGAGGGIG AGGGTGGGCGGITGITCGGG AGYCTG GCG TIC TGG TCCAAGGAG GIG GYGTGAGT 2993
46
AspGiy Asp GinGys Len Val Len Pro LeuGin
GGGIIGIAGA IGGGGGGCIG GACIACGGGG GGGGGGGCGCC CIGGGGAIGI GIGCGGGGGG AGGGGGIAGG GGGCCTIGIG IGGGAGV AG GGI GAG GAG IGG IIGGIG IIG CCCIIGGAG 3111 91
His Pro CysAla Ser Leu CysCys Gly His Gly ThrCys Ile Asp Gly Ile Gly Ser Phe Ser CysAsp CysArg Ser Gly TrpGlu Gly Arg Phe Cys Gln Arg
CAC CCG TGC GCC AGC CTG TGC TGC GGGCAC GGCACG TGC ATC GAC GGC ATC GGC AGC TTCAGC TGC GAC TGC CGC AGC GGC TGGGAG GGC CGCTICIGCGAG CGC 3216 92
GluVal Ser Phe LeuAsn
GYGTGAGGGGGAGAGGTGG ATGCTGGCGGGCGGCGGGGGC GGGGCTGGGGCCGGGTTGGG GGCGCGGCAC CAGCACCAGC TGCCCGCGCC CTCCCCTGCC CGCAGV AGGTGAGC TTC CTCAAT 3336
Biochemistry: Foster et al. Proc. Natl. Acad. Sci. USA 82 (1985) 4675
Cys Ser Leu Asp Asn Gly GlyCys ThrHis Tyr Cys Leu Glu Glu Val Gly Trp Arg Arg Cys Ser CysAla Pro Gly Tyr Lys Leu Gly Asp Asp Leu Leu Gln
TGC TCTCTG GAC AACGGC GGC TGCACG CAT TAC TGC CTA GAG GAG GTGGGC TGG CGG CGC TGT AGC TGTGCG CCT GGCTACAAG CTG GGG GAC GAC CTC CTG CAG 3440 136
CysHis Pro Ala
TGTCAC CCC GCAGYGTGAGAAGCC CCCAATACAT CGCCCAGGAA TCACGCTGGG TGCGGGGTGGGCAGGCCCCT GACGGGCGCG GCGCGGGGGG CTCAGGAGGG TTTCTAGGGA GGGAGCGAGG 3564 AACAGAGTTG AGCCTTGGGG CAGCGGCAGA CGCGCCCAAC ACCGGGGCCACTGTTAGCGCAATCAGCCCG GGAGCTGGGC GCGCCCTCCG CTTTCCCTGC TTCCTTTCTT CCTGGCGTCC CCGCTTCCTC 3694
CGGGCGCCCC TGCGACCTGG GGCCACCTCC TGGAGCGCAA GCCCAGTGGT GGCTCCGCTC CCCAGTCTGAGCGTATCTGG GGCGAGGCGT GCAGCGTCCT CCTCCATGTA GCCTGGCTGC GTTTTTCTCT 3824
GACGTTGTCCGGCGTGCATC GCATTTCCCT CTTTACCCCC TTGCTTCCTT GAGGAGAGAACAGAATCCCG ATTCTGCCTT CTTCTATATT TTCCTTTTTA TGCATTTTAA TCAAATTTAT ATATGTATGA 3954
AACTTTAAAA ATCAGAGTTT TACAACTCTT ACACTTTCAG CATGCTGTTC CTTGGCATGG GTCCTTTTTT CATTCATTTT CATAAAAGGT GGACCCTTTT AATGTGGAAA TTCCTATCTT CTGCCTCTAG 4084
GGCATTTATC ACTTATTTCT TCTACAATCT CCCCTTTACT TCCTCTATTT TCTCTTTCTG GACCTCCCAT TATTCAGACC TCTTTCCTCT AGTTTTATTG TCTCTTCTAT TTCCCATCTC TTTGACTTTG 4214
TGTTTTCTTT CAGGGAACTTTCTTTTTTTT CTTTTTTTTT GAGATGGAGT TTCACTCTTG TTGTCCCAGG CTGGAGTGCA ATGACGTGAT CTCAGCTCAC CACAACCTCC GCCTCCTGGATTCAAGCGAT 4344
TCTCCfTtdGCAGdtt:GAGTAGCTGGGATTACAGGCA TGCGCCACCA CGCCCAGCTA ATTTTGTGTT TTTAGTAGAGAAGGGGTTTCTCCGTGTTGG TCAAGCTGGT CTTGAACTCC TGACCTCAGG 4474
TGATCCACCT GCCTTGGCCT CCTAAAGTGC TGGGATTACA GGCGTGAGCC ACCGCGCCCGAGGGTqTT~ GG.GMTN.TACAACTTTA TAATTCAATT CTTCTGCAGAAAAATTTT TGGCCAGGCT 4604
CAGTAGCTCA GACCAATAAT TCCAGCACTT TGAGAGGCTG AGGTGGGAGG ATTrrTTrAr CTTGGGAGTTTGAGACTAGC C*TGGGCAACArArTrArArr CTGTCTCTATTTTTAAAAAA AGTAAAAAAA 4/34
LzbP~iILAX IRCALLARIRIM I,LLAbALRII I bAbAbli bMlXIbblbbbib AIbllXIIab CIIllbbAGII IbbAAI~bCl6XIbbbLMAALR lbibAbRCXLLLblXXIAIII IIIIMAMAAAbIMMARARR SQ/'
GATCTAAAAA TTTAACTTTTTATTTTGAAA TAATTAGATA TTTCCAGGAA GCTGCAAAGA AATGCCTGGTGGGCCTGTTG GCTGTGGGTT TCCTGCAAGGCCGTGGGAAGGCCCTGTCAT TGGCAGAACC 4864
CCAGATCGTG AGGGCTTTCC TTTTAGGCTG CTTTCTAAGA GGACTCCTCC AAGCTCTTGGAGGATGGAAGACGCTCACCC ATGGTGTTCG GCCCCTCAGA GCAGGGTGGG GCAGGGGAGC TGGTGCCTGT 4994 GCAGGCTGTG GACATTTGCA TGACTCCCTG TGGTCAGCTA AGAGCACCAC TCCTTCCTGA AGCGGGGCCT GAAGTCCCTA GTCAGAGCCT CTGGTTCACC TTCTGCAGGCAGGGAGAGGG GAGTCAAGTC 5124 AGTGAGGAGG GCTTTCGCAG TTTCTCTTAC AAACTCTCAA CATGCCCTCC CACCTGCACT GCCTTCCTGG AAGCCCCACA GCCTCCTATG GTTCCGTGGT CCAGTCCTTC AGCTTCTGGGCGCCCCCATC 5254
ACGGGCTGAG ATTTTTGCTT TCCAGTCTGC CAAGTCAGTT ACTGTGTCCA TCCATCTGCT GTCAGCTTCT GGAATTGTTG CTGTTGTGCC CTTTCCATTC TTTTGTTATG ATGCAGCTCC CCTGCTGACG 5384 ACGTCCCATT GCTCTTTTAA GTCTAGATAT CTGGACTGGG CATTCAAGGC CCATTTTGAG CAGAGTCGGG CTGACCTTTC AGCCCTCAGT TCTCCATGGA GTATGCGCTC TCTTCTTGGC AGGGAGGCCT 5514
CACMACATGCCATGCCTAT TGTAGCAGCT CTCCAAGAAT GCTCACCTCC TTCTCCCTGT AATTCCTTTC CTCTGTGAGG AGCTCAGCAG CATCCCATTA TGAGACCTTA CTAATCCCAG GGATCACCCC 5644
CAACAGGCCTGGGGTACAAT GAGCTTTTAA GAAGTTTAAC CACCTATGTA AGGAGACACAGGCAGTGGGC GATGCTGCCT GGCCTGACTC TTGCCATTGG GTGGTACTGT TTGTTGACTGAGTGAGTGAG 5774
TGACTGGAGGGGTTTGTAA TTTGTATCTC AGGGATTAGG GGGAAGAGG GTGGGGTAGA ATGAGGGTTG AAGAAGTTTAAGCAATATG TAAGGACACA CAGCCAGTGG GTGATGCTGC CTGGTCTGAC 5904
TGTTGGGATT-AGAGgCT GTTTGTTGAC TGACTGACTG ACTGACTGGCTGAGTGGAGG GGGTTCATAG CTAATATTAA TGGAGTGGTC TAAGTATCAT TGGTTCCTTG AACCCTGCAC TGTGGCAAAG 6034 137
VJal Lys Phe Pro Cys Gly Arg ProTrp Lys Arg TGGCCCACAGGCTGGAGGAG GACCAAGACAGGAGGGCAGT CTCGGGAGGA GTGCCTGGCA GGCCCCTCAC CACCTCTGCC TACCTCAGVTG AAG TTC CCT TGT GGG AGG CCC TGGAAG CGG 6154
MetGlu Lys Lys Arg Ser His LeuLysArgfAspThr G1uAsp Gin GiUAsp Gin Val Asp Pro ArgkLeu Ile Asp Gly Lys Met ThrArg Arg Gly Asp Ser Pro
ATG GAG AAG MGCGC AGT CAC CTG AMCGA GAC ACA GAA GACCM GMGAC CAA GTA GAT CCG CGG CTC ATT GAT GGG MG ATG ACCAGG CGGGGA GAC AGC CCC 6259
184
Trp Gin
TGG CAGYGTGGGAGGCGAGGCAGCACC GGCTCGTCAC GTGCTGGGTC CGGGATCACT GAGTCCATCC TGGCAGCTAT GCTCAGGGTGCAGMACCGA GAGGGAAGCGCTGCCATTGC GTTTGGGGGA 6385
TGATGMGGT GGGGGATGCTTCAGGGAMG ATGGACGCMCCTGAGGGGA GAGGAGCAGCCAGGGTGGGT GAGGGGAGGG GCATGGGGGC ATGGAGGGGTCTGCAGGAGG GAGGGTTACA GTTTCTAMA 6515
AGAGCTGGMAGACACTGCT CTGCTGGCGG GATTTTAGGC AGAAGCCCTG CTGATGGGAG AGGGCTAGGA GGGAGGGCCG GGCCTGAGTA CCCCTCCAGC CTCCACATGG GMCTGACAC TTACTGGGTT 6645
CCCCTCTCTG CCAGGCATGG GGGAGATAGGAACCMG MG TGGGAGTATT TGCCCTGGGG ACTCAGACTC TGCMGGGTC AGGACCCCM AGACCCGGCA GCCCAGTGGG ACCACAGCCA GGACGGCCCT 6775
TCMGATAGGGGCTGAGGGA GGCCMGGGG MCATCCAGG CAGCCTGGGGGCCACMAGT CTTCCTGGM GACACMGGC CTGCCAAGCC TCTMGGATG AGAGGAGCTC GCTGGGCGATGTTGGTGTGG 6905
CTGAGGGTGACTGAAACAGT ATGMCAGTG CAGGMCAGC ATGGGCAMG GCAGGMGAC ACCCTGGGAC AGGCTGACAC TGTAAMTGG GCAAAAATAGMAACGCCAG AAAGGCCTAAGCCTATGCCC 7035
185
Vai Val Leu LeuAsp Ser Lys
ATATGACCAG GGAACCCAGG AAAGTGCATA TGAMCCCAG GTGCCCTGGA CTGGAGGCTG TCAGGAGGCA GCCCTGTGAT GTCATCATCCCACCCCATTC CAGVGTG GTC CTG CTG GAC TCAMG 7159
0 223
Lys Lys Leu AiaCys GlyAia Vai Leu IleHis Pro SerTrp Vai Leu Thr Aia Ala HisCys MetAsp Glu SerLys LysLeu Leu Val Arg Leu
MG MGCTG GCC TGC GGGGCA GTG CTC ATC CAC CCC TCCTGG GTGCTG ACA GCG GCC CAC TGC ATG GAT GAGTCC AAGAAGCTC CTT GTC AGG CTT G V GTATGGGCTG 7266
GAGCCAGGCA GAAGGGGGCT GCCAGAGGCCTGGGTAGGGG GACCAGGCAG GCTGTTCAGG TTTGGGGGAC CCCGCTCCCC AGGTGCTTM GCMGAGGCT TCTTGAGCTC CACAGAAGGT GTTTGGGGGG 7396
AAGAGGCCTA TGTGCCCCCA CCCTGCCCAC CCATGTACAC CCAGTATTTT GCAGTAGGGG GTTCTCTGGT GCCCTCTTCG AATCTGGGCA CAGGTACCTGCACACACATGTTTGTGAGGG GCTACACAGA 7526
CCTTCACCTC TCCACTCCCA CTCATGAGGA GCAGGCTGTG TGGGCCTCAG CACCCTTGGGTGCAGAGACC AGCMGGGCCT GGCCTCAGGG CTGTGCCTCC CACAGACTGA CAGGGATGGA GCTGTACAGA 7656
GGGAGCCCTA GCATCTGCCA MGCCACMGCTGCTTCCCTAGCAGGCTGG GGGCTCCTATGCATTGGCCC CGATCTATGG CAATTTCTGG AGGGGGGGTC TGGCTCMCT CTTTATGCCA AAAAGAAGGC 7786
AAAGCATATT GAGAAAGGCCAMATTCACAT TTCCTACAGC ATAATCTATG CCAGTGGCCC CGTGGGGCTT GGCTTAGMT TCCCAGGTGC TCTTCCCAGGGMCCATCAG TCTGGACTGAGAGGACCTTC 7916 TCTCTCAGGT GGGACCCGGC CCTGTCCTCC CTGGCAGTGC CGTGTTCTGG GGGTCCTCCT CTCTGGGTCT CACTGCCCCT GGGGTCTCTC CAGCTACCTT TGCTCCATGT TCCTTTGTGG CTCTGGTCTG 8046
TGTCTGGGGT TTCCAGGGGT CTCGGGCTTC CCTGCTGCCC ATTCCTTCTC TGGTCTCACG GCTCCGTGACTCCTGMAAC CAACCAGCAT CCTACCCCTT TGGATTGACA CCTGTTGGCC ACTCCTTCTG 8176
GCAGGMAAG TCACCGTTGA TAGGGTTCCA CGGCATAGAC AGGTGGCTCC GCGCCAGTGC CTGGGACGTG TGGGTGCACA GTCTCCGGGT GAACCTTCTT CAGGCCCTCT CCCAGGCCTG CAGGGGCACA 8306
224
GlyGlu TyrAsp LeuArg Arg TrpGlu Lys Trp Glu LeuAsp GCAGTGGGTG GGCCTCAGGA MGTGCCACT GGGGAGAGGC TCCCCGCAGC CCACTCTGACTGTGCCCTCT GCCCTGCAGYGA GAG TATGAC CTG CGGCGC TGG GAGMG TGG GAGCTG GAC 8426
0
LeuAsp Ile Lys Glu Val Phe Val HisProAsn Tyr Ser Lys SerThr ThrAspAsn Asp IleAla Leu LeuHis Leu Ala Gln ProAla Thr Leu SerGln Thr
CTG GAC ATC MG GAG GTC TTC GTC CACCCCAAC TAC AGC MGAGC ACC ACC GAC MT GAC ATC GCA CTG CTG CAC CTG GCC CAGCCC GCC ACC CTC TCGCAG ACC 8531 Ile Val Pro Ile Cys Leu Pro Asp SerGly LeuAla Glu Arg GluLeu AsnGln Ala Gly Gin Glu Thr LeuVal Thr Gly Trp Gly Tyr His Ser Ser Arg Glu
ATAGTG CCCATC TGC CTC CCG GAC AGC GGC CTT GCA GAG CGC GAG CTC MTCAG GCC GGC CAG GAGACC CTC GTGACG GGC TGG GGC TAC CAC AGC AGC CGA GAG
Lys Glu AlaLys Arg Asn ArgThr Phe Val LeuAsn Phe Ile Lys Ile ProVal Val Pro His Asn Glu Cys SerGlu Val Met SerAsn MetVal Ser GluAsn AAG GAG GCC AAG AGA MC CGC ACC TTC GTC CTC MC TTC ATC AAG ATT CCC GTGGTC CCGCACAAT GAG TGCAGC GAG GTCATG AGC AACATG GTGTCT GAG AAC
0
Met Leu Cys Ala Gly IleLeu GlyAsp Arg GlnAspAla Cys Glu Gly AspSer Gly GlyProMet Val Ala Ser Phe His Gly Thr Trp PheLeuVal Gly Leu
ATG CTG TGT GCG GGCATC CTC GGG GAC CGG CAGGAT GCC TGC GAGGGC GAC AGTGGG GGG CCC ATGGTC GCC TCC TTC CACGGC ACC TGG TTC CTG GTGGGC CTG
8636
8741
8846
Val Ser Trp Gly Glu Gly Cys Gly Leu Leu HisAsn Tyr Gly Val Tyr Thr Lys Val SerArg Tyr LeuAsp Trp Ile His Gly His Ile Arg Asp Lys GluAla GTG AGC TGG GGT GAG GGC TGT GGGCTC CTTCAC MC TAC GGCGTT TAC ACCMA GTC AGC CGC TAC CTC GAC TGGATC CAT GGGCAC ATCAGAGAC AAG GAAGCC 8951
419
Pro Gin LysSer Trp Ala ProSTOP
CCC CAGMG AGC TGG GCACCT TAG CGACCCTCCCTGCAGGGCTG GGCTTTTGCA TGGCMTGGA TGGGAGATTA':GGGACATG TAACAAGCAC ACCGGCCTGC TGTTCTGTCC TTCCATCCCT 9075
CTTTTGGGCTCTTCTGGAGG GMGTAACAT TTACTGAGCA CCTGTTGTAT GTCACATGCC TTATGMTAG MTCTTAACT CCTAGAGCAA CTCTGTGGGG TGGGGAGGAGCAGATCCAAG TTTTGCGGGG 9205
TCTAAAGCTG TGTGTGTTGA GGGGGATACT CTGTTTATGA AAAAG~AAACAGAACCACGMGCCACTAGAGCCTT TTCCAGGGCT TTGGGAAGAG CCTGTGCAAG CCGGGGATGC TGMGGTGAG 9335 GCTTGACCAG CTTTCCAGCT AGCCCAGCTA TGAGGTAGAC ATGTTTAGCT CATATCACAG AGGAGGAAAC TGAGGGGTCT GMAGGTTTACATGGTGGAGCCAGGATTCAMTCTAGGTC TGACTCCAM 9465
ACCCAGGTGC TTTTTTCTGTTCTCCACTGT CCTGGAGGACAGCTGTTTCGACGGTGCTCA GTGTGGAGGC CACTATTAGC TCTGTAGGGA AGCAGCCAGA GACCCAGAAA GTGTTGGTTCAGCCCAGAAT 9595
FIG. 2. Nucleotide sequence for the gene for humanproteinC. The first base of the methionine codon wheretranslation is initiated is numbered+1. Arrowheads indicate intron-exonsplicejunctions.ThetwoAlu sequences in intron Ehave been underlined withasolidline; the 18-base repeatsflankingthe first Alu sequence and the 8-base repeatsflankingthesecond Alu sequence have been underscored with dots. Thehighly conserved sequences of C-C-A-G-C-C-T-G-G have been underlined withaheavysolidline, contrastingwith thetwohomologous 160-bp repeats in intron E which have been lightly underlined. The polyadenylylation or processing sequences of A-T-T-A-A-A and A-A-T-A-A-Aatthe3' endareboxed. TheconsensusofC-T-T-T-G,which also may be involved inpolyadenylylationorcleavageof mRNA at the 3'end, is underlined withawavyline. *, Potential carbohydrate bindingsitestoasparagine residues; ,apparent cleavage sites for processing of theconnecting dipeptide; I, siteofcleavageintheheavychainwhenproteinCisconvertedtoactivatedprotein C;o, active site asparticacid,histidine,andserineresidues;*, sites ofpolyadenylylation.
DNA sequenceoverlappingthetwoEcoRIjunctionsbetween the three fragments, twoBglIIfragmentsof 3.3 and 7.0 kb wereisolatedand subcloned into the BamHI siteofpUC9. These twoclones span the EcoRI sites.
A detailed restriction map as well as approximate place-ment ofthe exon regions within the subcloned fragments wereestablishedby further restrictionanalysisandSouthern
blotting (Fig. 1). When the 5' and3' ends of the genewere established, thenucleotide sequence of the gene was
deter-mined by the dideoxy chain-termination method using
nucleaseBAL-31toprovide overlappingsequences between theendsoflargerestrictionfragments.
Thenucleotidesequenceforthe geneforhuman
protein
Cspans -11 kbofDNA(Fig. 2). Comparison ofthe
genomic
sequence with that ofthe cDNA (9) revealed that the gene
consists of eight exons
ranging
in size from 25 to 885 nucleotides andsevenintronsranginginsize from 92to2668nucleotides. An additional intron(s) in the 5'
noncoding
region cannot be ruled out because a cDNA
covering
this regionwasnotavailable forcomparison
with thegene.Also,
LG D Y L 44 G L IntronE P~A ~c (137) GG 5CCHP AVKFPCGRPWKRMEKKR DtL H H Lp- 155 LH .8 97-LL9L V L T IDNH2 OGWER SF RE T H tonGE S F V E(5D) D CCR A LR G r ACTIVATION SL 020N p E (15/16) E A C.C A A T L vV PE C rSF S L K L L W 0 P5G 0 H R K L F V Hr H E L R K 0 R D K G T Pr8L e CS LVL 71TIDP KCMK (Thrombnb T 0 L 4c L T 0 GD T D 9C8 MKG H (46) S K L V NH? K S K v L H -42 M w E ON79 V Sp H H D0 0 EK PHVF K L A w E T
TL OnfoB K CATALYTIC DOMAIN ADH1F~A
D(37/38)OR VAS HG W L F~ L D N144 M V Y 0 L v RNM V p 0G R K L N T S s G L S S F F M K . V v w V 0 mG011VEN K A A I L E SG E W r29 N-5 L E GCOOH G5
A FIKIPVVPHNEC AGILGDRODD LNY4ts
F Int~~~~~~~~~~~~PonA p' 4
C78RrLSSHRLFFLFSNo.AKRIRLVQAESFS
r "mass" ~~~~~PRE-POO LEADER
Protein
C
Proc. NatL. Acad. Sci. USA 82(1985)
R N G K IntronE Q ET -S (128) 14 FactorXI, R NGRCCS EPAVPFPGRVSVSQTSKLT&jEAVFPDVDYVN-12 KC. FO V CIT NF K A T E
<°FpFNNN DSC.CsEAAN EE N TTE EK1nfronn G EAACTIVATIONPEPTIDE
0 PC.CELDT L H AG A54 L W InfronD N v T v D 'IE (85) F pL aE V IntronF N*22 pLNy KL E K I VV05/16) T N SN K D R K t 0 F L N K GN w s ER SG E V G K T CK_ 6 S L RI v QF D A4Y 41 V A G FC DR LR LC - C C P N NHCN2 rvLOCVKp 8890 ID -46SD D-IFntronCS H H A G A F N G D T DO GIAI N~ rssc PE-PR5f R Dv R N TIFH G FactorXI R K V N N V I N F F IA KEN T 40 Hi T K CATALYTIC DOMAIN M T Int~ronB RGw KEA R 3/9 S VTEVEGTSF N K E *AL H V K3 N L FT P G R K FL T N G 5 1235 L V ~~~~~~SN 0)t85 S V COOH T 3 8 M -D WE K R L L F E5 T c A RvP C-C QC-C L RT AGFHESGGRDS AMKGKYGSGL rF Y A s IntronA L Ci' C, 7 (~(-17) L~ KC3R3LNG0VFTrTLKG;SN®MKPRNLIKNANEHDLFVTCEA Q.5
DOMAIN PIOTEASE PRE-PRO LEADER
Factor IX
FIG. 3. Amino acid sequence and tentative structuresfor humanprepro-protein C and preprofactorIX. Protein Cis shownwithout the
Lys-Argdipeptide,whichconnectsthelightandheavychains.Locationsof the seven introns (Athrough G)for each gene areindicated by solid bars. Amino acidsflankingknownproteolytic cleavagesitesarecircled.Theactive-sitehistidine, aspartic acid,andserineresidues are also circled.*,Potential carbohydratebindingsites. Theproposeddisulfide bonds have beenplacedbyanalogytothosein bovine prothrombin and
epidermal growthfactor. Thefirstamino acids in thelight chain,activationpeptide,andheavychainstartwith number 1 anddifferfrom that shown inFig. 2. Thefactor IX structurewasthat ofYoshitake et al. (12). y, y-carboxyglutamic acid;A,
3-hydroxyaspartic
acid.several
potential
intron/exon
splice
donor and acceptor sequenceswereidentified in the 5'noncoding region.
All theintron/exon splice junctions
were similarto the consensussequences
recently
summarizedby
Mount(20)
andfollow theG-T/A-G
rule ofBreathnach andChambon(21).Several
potential
"TATA" sequences were foundup-streamfromthe
preproleader
sequence in thegenefor humanprotein
C. The sequences of T-A-T-A-A-T-A(starting
atposition -1785)
and T-A-T-A-A-T-T(starting
atposition
-1853)
show the strongest homology with the consensus sequence of T-A-T-A- -A-A.Both, however,
lacknearby
"CAAT"sequencesupstream. If eitherof thesesequencesis associated with initiation oftranscription,
thenprotein
C would have eitheraverylong 5'noncoding sequenceor an additionalintron(s)
in the 5'noncoding region
of thegene.Two
polyadenylylation
orprocessing sequences ofA-T-T-A-A-A and A-A-T-A-A-A (22) were found 47 and 276 nucleotides downstream from the translation stop codon
(nucleotides
starting
at9022 and9251). The second of these also has a sequence of C-T-T-T-G starting 37 nucleotides downstream. This latter sequencecorresponds to theC-A-C-T-G
consensus sequence and also may be involved inpolyadenylylation
orcleavage
atthe 3' end ofthemRNA(23).The DNA sequence of
eight
separate cDNAsatthe 3' end indicates that polyadenylylation occurs with about equalfrequency
downstream from the two polyadenylylation orprocessing
sites(datanot shown).Thegene forprotein C contains twoAlu sequences (24), and bothare locatedinintronE (solid underline inFig. 2). Thefirst isa
complete
copywithanorientation of 3'to5'.It is flankedby
the directrepeat sequenceofT-C-T-T-T-C-A-G-G-G-A-A-C-T-T-T-C-T. The second Alu sequence is 30 nucleotidesafter the
flanking
repeatofthe
first and isapartial copyofanAlusequenceoriented5'to3'. This Alusequencelacks the
right
half of the Alu consensus sequence and is flankedby
the directrepeatofA-A-A-A-A-T-T-T. IntronEalso
contains
twodirectrepeatsof about 160nucleotides ofunknown significance (dashed underline in Fig. 2). These repeatsare about 93%homologous and start atnucleotides 5628 and 5800. They are separated by 10 nucleotides. A
computer comparison of this sequence with the National
Institutes of Healthsequencedata bank revealedno
signifi-canthomologywithpublishedsequences.
The cDNA sequence (9), along with that of the gene, provides the entire amino acid sequence for human
preproprotein
C(Fig.3Left).
These dataindicate that humanprotein C, like the other vitamin K-dependent coagulation
factors, is initially synthesized as a single-chain precursor
withapreproleadersequenceof42 aminoacids.This leader
sequenceshowsconsiderable amino acid sequencehomology
with thatrecentlydescribed forbovineproteinC (10). Based on
homology
with the leadersequenceof bovine proteinC and other-y-carboxylated
coagulationproteasesintheregionfrom -1 to -20, it is likely that this leader sequence is cleaved by a signal
peptidase
after the alanine residue at position -10. This wouldyield aprozymogenform witha highly basicpropeptide of nine residues. Processing tothe matureproteinthat circulatesinplasmainvolves additionalproteolytic cleavage after residues at -1, 155, and 157 to
remove the amino-terminal propeptide and the Lys-Arg
dipeptide
thatconnectsthelight and heavychains (9). The processing of the single chain is not complete, however,because about 5-15% of the proteinC in human plasma is
presentas asingle-chain molecule(25).
The amino acid composition of the mature protein C
circulating
in plasma was calculated as follows:Asp2sAsp
(OH)X Thr15
Ser30 Glu24 Gln13Glag Pro18
Gly33Ala2l
VaI26Met7Ilel6Leu43Tyr8Phel3Lys22Hisl7Arg23Trpl3Cys24,
inwhich Gla is
-carboxyglutamic
acid andAsp(J30H)
is f8-hydroxyaspartic acid. The molecularweightfor theprotein wascalculatedtobe47,456 withoutcarbohydrateandabout61,600with theaddition of23% carbohydrate (26). Four of thepotential carbohydratechains boundtoasparagineoccur
Proc. Natl. Acad. Sci. USA 82 (1985) 4677 atresidues 97 in thelight chain and at residues79, 144, and
160 in the heavy chain (Fig. 3).
The DNA sequence of the coding region for the genefor
human protein C agrees wellwiththatofthecDNAforhuman
protein C(9) except for thetriplet coding forAsp-214. Both
thegenomicsequence (GAT)andthe cDNA sequence(GAG) specify aspartic acid at this position. It is likely that the
discrepancy is due to either polymorphism or a cloning artifactat nucleotide 7228. Thegenomic DNA sequence and
thesequenceof longer cDNAmoleculeshaveshownthat the
amino acidatresidue64iscysteine rather thanglutamineas
previously reported (9). This discrepancy is likely to have resulted from anartifactualerrorintroducedinto thecDNA sequence adjacent tothe EcoRI linker used inconstructing
the Xgt1l cDNA library. This phenomenon has been ob-served in several other cDNAscharacterized inthis
labora-tory (unpublished results).
Protein C shows considerable amino acid sequence and structural homology with the other vitamin K-dependent
coagulationfactorsincludingprothrombin,factorVII, factor IX, and factor X. Factor IX, factor X, and protein C are
unusually similar in that they have common domain struc-tures throughout their molecules including a y-carboxyglu-tamic acid domain,twopotential growthfactordomains,an
activation peptide or connecting region, and a catalytic
domain (27). In prothrombin, the potential growth factor domains have been replaced bytwokringle structures. The
similarity between theseproteinsisalsoevidentatthe level
ofthe gene where protein C and factor IX show unusual
homology. This is illustrated in Fig. 3, which shows the
proposed domain structures and the seven introns in the genesfor these twoproteins.Inboth genes, the intronsoccur in essentially the samepositions throughout theamino acid
sequence ofthe two proteins. The similaritybetween these two genes is further reflected in the conservation ofsplice junction type. All seven introns in the gene for protein C
exhibit the same splice junction type as the intron in the correspondinglocationin the gene for factor IX(12). How-ever, a computer search of the DNA sequences within the
introns of thegenes in protein C and factor IX showed no significant homology, indicatingthat the sequences ofthese
regions ofthe genes are notconserved during evolution.
Thelocationsof the introns in the genes forproteinC and factorIX areprimarily between various functional domains
of the two proteins (Fig. 3). Exon II spans the highly conservedregionof the leadersequenceand the y-carboxy-glutamic acid domain. Exon III includes a stretch ofeight
amino acids whichconnect the y-carboxyglutamic acid and
growth factor domains. Exons IV and V each represent a potential growth factor domain, while exon VI covers a connectingregionthat includes the activationpeptide.Exons VIIandVIII coverthecatalytic domaintypicalofallserine proteases.
Thefirstthreeintronsin the gene for humanprothrombin (28) also occur in the same position in the amino acid sequenceasthose ofproteinC and factorIX.Inprothrombin,
however, the y-carboxyglutamic acidregion is followed
by
twokringlestructures,whichareunrelated in sequencetothe potential growthfactor domains ofproteinC and factor IX.
Afterthefirstthreeintrons,there appearstobenosimilarity in gene structurebetween thatof
prothrombin
and thoseof factorIXandprotein C.Thealignmentof intron boundariesin thegenes for
protein
C,factorIX,andprothrombinprovidesadditionalevidencefor the evolution ofthese genes from a common ancestral
precursor. This could have resulted from the joining of
numerous fragments of similar DNA sequences by a translocationevent(s) between chromosomes during evolu-tion. This could leadtotheformation ofagenecodingfora seine protease containing additional domains such as the
potential growth factor domains, kringle domains, and y-carboxyglutamicacid domains(12).
The authorsthank Drs. DominicChung,StevenLeytus,Takehiko
Koide,and Kotoku Kurachi forhelpfuldiscussions andadvice,and Dr. TomManiatis forkindlyprovidingthe humangenomic library
constructed inXCharon 4Abacteriophage.This workwassupported in partbyaresearch grant(HL16919)from the National Institutes of Health. D.C.F. wassupported byNational Institutes of Health
TrainingGrant GM 07270.
1. Stenflo, J.(1976)J. Biol. Chem.251, 355-363.
2. Kisiel, W., Ericsson, L. H. &Davie, E. W. (1976) Biochem-istry15, 4893-4900.
3. Esmon, C. T. & Owen, W.G. (1981) Proc. Natl.Acad. Sci. USA78,2249-2252.
4. Kisiel,W.,Canfield,W.M.,Ericsson,L. H. &Davie,E. W.
(1977) Biochemistry16, 5824-5831.
5. Marlar, R.A., Kleiss, A. J. & Griffin, J. (1982) Blood 59,
1067-1072.
6. Vehar,G. A. &Davie,E. W.(1980)Biochemistry 19,401-410. 7. Griffin, J. H., Evatt, B.,Zimmerman, T. S., Kleiss,A. J. &
Wideman,C. (1981)J. Clin.Invest. 68, 1370-1373.
8. Griffin, J. H., Mosher, D. F., Zimmerman, T. S. & Kleiss,
A.J. (1982)Blood60,261-264.
9. Foster,D.&Davie,E. W.(1984)Proc.Natl. Acad. Sci. USA
81,4766-4770.
10. Long, G. L.,Belagaje,R. M.&MacGillivray,R. T. A.(1984)
Proc.Natl. Acad. Sci. USA81, 5653-5656.
11. Anson, D. S., Choo, K. H., Rees, D.J. G., Giannell, F., Gould, J. A., Huddleston, J. A. & Brownlee, G. G. (1984) EMBO J.3, 1053-1060.
12. Yoshitake, S., Schach,B. G.,Foster,D. C., Davie,E. W. &
Kurachi,K. (1985)Biochemistry,in press.
13. Degen,S. J.F.,MacGillivray,R. T. A.&Davie,E. W.(1983) Biochemistry 22,2087-2097.
14. Maniatis, T., Hardison,R.C., Lacy, E., Lauer, J.,O'Connell,
C., Quon, D., Sim, G. K. & Efstratiadis, A. (1978) Cell 15,
687-702.
15. Woo,S. L.C. (1979)MethodsEnzymol. 68,381-395. 16. Silhavy,T. J., Berman,W. L.&Enquist,L. W.(1984)
Exper-iments with Gene Fusions (Cold Spring Harbor Laboratory, ColdSpring Harbor, NY),pp. 140-141.
17. Guo, L. H., Yang, R. C. A. & Wu, R. (1983) NucleicAcids Res. 11, 5521-5540.
18. Biggin,M.D.,Gibson,T. J.&Hong,G. F.(1983)Proc.Natl. Acad. Sci. USA80,3963-3965.
19. Larson,R. &Messing,J.(1982)Nucleic AcidsRes. 10,39-50. 20. Mount,S. M.(1982)NucleicAcidsRes. 10,459-472. 21. Breathnach, R. &Chambon, P. (1981)Annu. Rev.Biochem.
50,349-383.
22. Proudfoot, N. & Brownlee, G. (1981)Nature (London) 252, 359-362.
23. Berget, S. M. (1984)Nature(London)309, 179-181.
24. Deininger,P.L., Jolly,D.J., Rubin,C. M., Freidmann,T.& Schmid,C. W. (1981)J. Mol. Biol. 151, 17-33.
25. Miletich, J.P., Leykam, F. J. & Broze, G. J. (1983) Blood
Suppl. 1, 62, 306a.
26. Kisiel, W. & Davie, E. W. (1981) Methods Enzymol. 80, 320-332.
27. Banyai, L., Varadi, A. & Patthy, L. (1983)FEBS Lett. 163, 37-41.
28. Davie, E. W., Degen, S. J. F., Yoshitake, S. & Kurachi,K.
(1983)Dev.Biochem. 25,45-52.